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To my grandchildren, Ida and Isaac 


Preface 


This book is a condensed version of my Mathematics and Its History, 
which has reached a third edition and is now too encyclopedic/overweight 
to be covered in a single course. Since I feel strongly that a single course 
overview of undergraduate mathematics is more desirable today than ever 
before, I have decided to cut Mathematics and Its History down to size. 
Hopefully, this will also make the book more cohesive, with everything 
connected to everything else. What I said in the Preface to the first edition 
still applies: 


One of the disappointments experienced by most mathematics 
students is that they never get a course on mathematics. They 
get courses in calculus, algebra, topology, and so on, but the 
division of labor in teaching seems to prevent these different 
topics from being combined into a whole. In fact, some of the 
most important and natural questions are stifled because they 
fall on the wrong side of topic boundary lines. Algebraists do 
not discuss the fundamental theorem of algebra because 
“that’s analysis” and analysts do not discuss Riemann surfaces 
because “that’s topology,” for example. Thus if students are to 
feel they really know mathematics by the time they graduate, 
there is a need to unify the subject. 


This book aims to give a unified view of undergraduate math- 
ematics by approaching the subject through its history. Since 
readers should have had some mathematical experience, cer- 


vil 
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tain basics are assumed and the mathematics is not developed 
formally as in a standard text. On the other hand, the mathe- 
matics is pursued more thoroughly than in most general histo- 
ries of mathematics, because mathematics is our main goal and 
history only the means of approaching it. Readers are assumed 
to know basic calculus, algebra, and geometry, to understand 
the language of set theory, and to have met some more ad- 
vanced topics such as group theory, topology, and differential 
equations. I have tried to pick out the dominant themes of this 
body of mathematics, and to weave them together as strongly 
as possible by tracing their historical development. 


Some historians of mathematics may object to my anachro- 
nistic use of modern notation and (fairly) modern interpreta- 
tions of classical mathematics. This has certain risks, such as 
making the mathematics look simpler than it really was in its 
time, but the risk of obscuring ideas by cumbersome, unfamil- 
iar notation is greater, in my opinion. Indeed, it is practically 
a truism that mathematical ideas generally arise before there 
is notation or language to express them clearly, and that ideas 
are implicit before they become explicit. Thus the historian, 
who is presumably trying to be both clear and explicit, often 
has no choice but to be anachronistic when tracing the origins 
of ideas. 


Mathematicians may object to my choice of topics, since a 
book of this size is necessarily incomplete. My preference has 
been for topics with elementary roots and strong interconnec- 
tions. The major themes are the concepts of number and space: 
their initial separation in Greek mathematics, their union in the 
geometry of Fermat and Descartes, and the fruits of this union 
in calculus and analytic geometry. Certain important topics of 
today, such as Lie groups and functional analysis, are omitted 
on the grounds of their comparative remoteness from elemen- 
tary roots. Others, such as probability theory, are mentioned 
only briefly, as most of their development seems to have oc- 
curred outside the mainstream. For any other omissions or 
slights I can only plead personal taste and a desire to keep the 
book within the bounds of a one- or two-semester course. 
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I would only add that I am hoping in fact to stay within the bounds of 
a one-semester course. Thus the content is now somewhat less than in the 
first edition of Mathematics and Its History, or at least more compact. In 
particular, I have dropped the biographical sketches that took up about 20% 
of the book, since short mathematical biographies are now widely available 
at sites such as 

http: //www-history.mcs.st-and.ac.uk/BiogIndex. html 

On the other hand, there are many more exercises than in the first edition, 
so instructors will have considerable freedom in assigning problems. Also, 
many of the black-and-white line drawings from earlier editions have been 
improved or completely replaced by new ones with color, and in many 
cases with 3D modeling using the excellent free software POV-Ray. These 
enhancements should make the diagrams easier to “read.” 

Much of the material in this condensed version is taken from the full 
Mathematics and Its History, Stillwell (2010a). However, most of Chapter 
16 is new, and there are several new sections or subsections in other chap- 
ters. In addition, hundreds of small changes and additions have been made 
to improve clarity and to add new information. 

As always, I thank my wife Elaine for her meticulous proofreading. I 
also thank the anonymous referees for numerous corrections and improve- 
ments, and Loretta Bartolini for expertly coordinating the production of the 
book. Many thanks also go to Rossella Lupacchini for locating a crucial 
picture in a Bombelli manuscript in Bologna. 


John Stillwell 
South Melbourne, June 2020 
San Francisco, September 2019 
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The Theorem of Pythagoras 


PREVIEW 


The Pythagorean theorem is the most appropriate starting point for a book 
on mathematics and its history. It is not only the oldest mathematical the- 
orem, but also the source of three great streams of mathematical thought: 
numbers, geometry, and infinity. 

The number stream begins with Pythagorean triples; triples of inte- 
gers (a, b,c) such that a* + b* = c*. The geometry stream begins with the 
interpretation of a’, b”, and c? as squares on the sides of a right-angled 
triangle with sides a, b, and hypotenuse c. The infinity stream begins with 
the discovery that V2, the hypotenuse of the right-angled triangle whose 
other sides are of length 1, is an irrational number. 

These three streams are followed separately through Greek mathemat- 
ics in Chapters 2, 3, and 4. The geometry stream resurfaces in Chapter 6, 
where it takes an algebraic turn. The basis of algebraic geometry is the 
possibility of describing points by numbers—their coordinates—and the 
bridge between coordinates and geometry is precisely the Pythagorean the- 
orem, which defines length in terms of coordinates. 

The Pythagorean theorem resurfaces in a new algebraic role in 
Chapter 16. Here it appears in the guise of the inner product, which 
introduces the concepts of length and angle into vector spaces. 
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2 1 The Theorem of Pythagoras 


1.1 Arithmetic and Geometry 


If there is one theorem known to all mathematically educated people, it 
is surely the theorem of Pythagoras. It will be recalled as a property of 
right-angled triangles: the square of the hypotenuse equals the sum of the 
squares of the other two sides (Figure 1.1). The “sum” is of course the sum 
of areas and the area of a square of side / is /’, which is why we call it “/ 
squared.” Thus the Pythagorean theorem can also be expressed by 


+P =c’, (1) 


where a, b, c are the side lengths of the red triangle in Figure 1.1. 


Figure 1.1: The Pythagorean theorem 


Conversely, a solution of (1) by positive numbers a, b, c can be realized 
by aright-angled triangle with sides a, b and hypotenuse c. It is clear that we 
can draw perpendicular sides a, b for any given positive numbers a, b, and 
then the hypotenuse c must be a solution of (1) to satisfy the Pythagorean 
theorem. This converse view of the theorem becomes interesting when we 
notice that (1) has some very simple solutions. For example, 


(a, b,c) = (3,4, 5), (37 +47 =94+16 =25 = 5°), 
(a, b,c) = (5,12, 13), (5% + 12? = 25 + 144 = 169 = 137). 


It is thought that in ancient times such solutions may have been used for 
the construction of right angles. For example, by stretching a closed rope 
with 12 equally spaced knots one can obtain a (3,4, 5) triangle with right 
angle between the sides 3, 4, as seen in Figure 1.2. 
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Figure 1.2: Right angle by rope stretching 


Whether or not this is a practical method for constructing right angles, 
the very existence of a geometrical interpretation of a purely arithmetical 
fact like 

P44 = 5 


is quite wonderful. At first sight, arithmetic and geometry seem to be com- 
pletely unrelated realms. Arithmetic is based on counting, the epitome of a 
discrete (or digital) process. The facts of arithmetic can be clearly under- 
stood as outcomes of certain counting processes, and one does not expect 
them to have any meaning beyond this. Geometry, on the other hand, involves 
continuous rather than discrete objects, such as lines, curves, and surfaces. 
Continuous objects cannot be built from simple elements by discrete pro- 
cesses, and one expects to see geometrical facts rather than arrive at them 
by calculation. 

The Pythagorean theorem was the first hint of a hidden, deeper relation- 
ship between arithmetic and geometry, and it has continued to hold a key 
position between these two realms throughout the history of mathematics. 
This has sometimes been a position of cooperation and sometimes one of 
conflict, as followed the discovery that V2 is irrational (see Section 1.5). It 
is often the case that new ideas emerge from such areas of tension, resolving 
the conflict and allowing previously irreconcilable ideas to interact fruit- 
fully. The tension between arithmetic and geometry is, without doubt, the 
most profound in mathematics, and it has led to the most profound the- 
orems. Since the Pythagorean theorem is the first of these, and the most 
influential, it is a fitting subject for our first chapter. 
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1.2 Pythagorean Triples 


Pythagoras lived around 500 sce, but the story of the Pythagorean theorem 
begins long before that, at least as far back as 1800 Bce in Babylonia. The 
evidence is a clay tablet, known as Plimpton 322, which systematically lists 
a large number of integer pairs (a,c) for which there is an integer b satis- 


fying 
a+b =c’. (1) 


A translation of this tablet, together with its interpretation and historical 
background, was first published by Neugebauer and Sachs (1945). Inte- 
ger triples (a,b,c) satisfying (1)—for example, (3,4,5), (5,12, 13), 
(8, 15, 17)—are now known as Pythagorean triples. Presumably the Baby- 
lonians were interested in them because of their interpretation as sides of 
right-angled triangles, though this is not known for certain. At any rate, the 
problem of finding Pythagorean triples was considered interesting in other 
ancient civilizations that are known to have possessed the Pythagorean the- 
orem; van der Waerden (1983) gives examples from China (between 200 
BcE and 220 ce) and India (between 500 and 200 sce). The most complete 
understanding of the problem in ancient times was achieved in Greek math- 
ematics, between Euclid (around 300 sce) and Diophantus (around 250 ce). 
A general formula for generating Pythagorean triples is 


a= (p? — q)r, b = 2qpr, c= (p? + q)r. 


It is easy to see that a?+b” = c? when a, b, c are given by these formulas, and 
of course a, b, c will be integers if p, g, r are. Even though the Babylonians 
did not have the advantage of our algebraic notation, it is plausible that this 
formula, or the special case 


aap -¢, ba2pq captg 


(which gives all solutions a, b, c, without common divisor and b even) 
was the basis for the triples they listed. Less general formulas have been 
attributed to Pythagoras himself (around 500 sce) and Plato (see Heath 
(1921), Vol. 1, pp. 80-81); a solution equivalent to the general formula is 
given in Euclid’s Elements, Book X (lemma following Prop. 28). As far as 
we know, this is the first statement of the general solution and the first proof 
that it is general. Euclid’s proof is essentially arithmetical, as one would 
expect since the problem seems to belong to arithmetic. 
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However, there is a far more striking solution, which uses the geomet- 
ric interpretation of Pythagorean triples. This emerges from the work of 
Diophantus, and it is described in the next section. 

EXERCISES 


The integer pairs (a, c) in Plimpton 322 are shown in Figure 1.3. 


a c 
119 169 
3367 | 4825 
4601 | 6649 
12709 | 18541 
65 97 
319 481 
2291 | 3541 
799 | 1249 
481 769 
4961 8161 
45 75 
1679 | 2929 
161 289 
1771 | 3229 
56 106 


Figure 1.3: Pairs in Plimpton 322 


1.2.1 For each pair (a,c) in the table, compute c? — a’, and confirm that it is a 
perfect square, b*. (Computer assistance is recommended.) 


You should notice that in most cases b is a “rounder” number than a or c. 


1.2.2 Show that most of the numbers b are divisible by 60, and that the rest are 
divisible by 30 or 12. 


Such numbers were in fact exceptionally “round” for the Babylonians, because 60 
was the base for their system of numerals. It looks like they computed Pythagorean 
triples starting with the “round” numbers b and that the column of b values later 
broke off the tablet. 

Euclid’s formula for Pythagorean triples comes out of his theory of divisibil- 
ity, which we take up in Section 3.3. Divisibility is also involved in some basic 
properties of Pythagorean triples, such as their evenness or oddness. 


1.2.3 Show that any integer square leaves remainder 0 or 1 on division by 4. 


1.2.4 Deduce from Exercise 1.2.3 that if (a,b,c) is a Pythagorean triple then a 
and b cannot both be odd. 
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1.3. Rational Points on the Circle 


We know from Section 1.1 that a Pythagorean triple (a, b, c) can be realized 
by a triangle with sides a, b and hypotenuse c. This in turn yields a triangle 
with fractional (or rational) number sides x = a/c, y = b/c and hypotenuse 
1. All such triangles can be fitted inside the circle of radius 1 as shown in 
Figure 1.4. The sides x and y become what we now call the coordinates of 


Y 


A 


Figure 1.4: The unit circle 


the point P on the circle. The Greeks did not use this language, but they 
could derive the relationship between x and y we call the equation of the 
circle. Since 


a+b=c (1) 


Cees 


so the relationship between x = a/c and y = b/c is 


we have 


r+y =1. (2) 


Consequently, finding integer solutions of (1) is equivalent to finding ratio- 
nal solutions of (2), or finding rational points on the curve (2). 
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Such problems are now called Diophantine, after Diophantus, who was 
the first to deal with them seriously and successfully. Diophantine equa- 
tions have acquired the more special connotation of equations for which 
integer solutions are sought, although Diophantus himself sought only ratio- 
nal solutions. (There is an interesting open problem that turns on this dis- 
tinction. Matiyasevich (1970) proved that there is no algorithm for deciding 
which polynomial equations have integer solutions. It is not known whether 
there is an algorithm for deciding which polynomial equations have ratio- 
nal solutions.) 


Most of the problems solved by Diophantus involve quadratic or cubic 
equations, usually with one obvious trivial solution. Diophantus used the 
obvious solution as a stepping stone to the nonobvious, but no account of his 
method survived. It was ultimately reconstructed by Fermat and Newton in 
the 17th century, and this chord and tangent construction will be considered 
later. Here, we need it only for the equation x” + y” = 1, which is an ideal 
showcase for the method in its simplest form (chord only). 


Y 


A 


Figure 1.5: Construction of rational points 


A trivial solution of this equation is x = —1, y = 0, which is the point Q 
on the unit circle (Figure 1.5). After a moment’s thought, one realizes that 
a line through Q, with rational slope f, 


y = t(x+ 1) (3) 
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will meet the circle at a second rational point R. This is because substitution 
of y = t(x + 1) in x? + y? = 1 gives a quadratic equation with rational 
coefficients and one rational solution (x = —1); hence the second solution 
must also be a rational value of x. But then the y value of this point will 
also be rational, since t and x will be rational in (3). Conversely, the chord 
joining Q to any other rational point R on the circle will have a rational 
slope. Thus by letting ¢ run through all rational values, we find all rational 
points R # Q on the unit circle. 

What are these points? We find them by solving the equations just dis- 
cussed. Substituting y = t(x + 1) in x? + y* = 1 gives 


r4+P(x41 =1, 
or 
(14+) 4+2Px+(P? -1)=0. 


This quadratic equation in x has solutions —1 and (1 — P)/ +). The 
nontrivial solution x = (1 — 17)/(1 + 27), when substituted in (3), gives 
y= 2/014). 

EXERCISES 


The parameter ¢ in the pair (5. 74) runs through all rational numbers if 


t = q/p and p, q run through all pairs of integers. 


1.3.1 Deduce that if (a, b,c) is any Pythagorean triple then 


a_p-G b_ 2q 
Cc pte c pre 


for some integers p and q. 


1.3.2 Use Exercise 1.3.1 to prove Euclid’s formula for Pythagorean triples, assum- 
ing b even. (Remember, a and b are not both odd.) 


The triples (a, b,c) in Plimpton 322 seem to have been computed to provide 
right-angled triangles covering a range of shapes—their angles actually follow a 
decreasing sequence in roughly equal steps. Figure 1.6 shows the lines with slope 
a/b, ranging from the top value 119/120 for the top line in Plimpton 322, to 56/90 
for the bottom line. 

This raises the question, can the shape of any right-angled triangle be approx- 
imated by a Pythagorean triple? 


1.3.3 Show that any right-angled triangle with hypotenuse 1 may be approxi- 
mated arbitrarily closely by one with rational sides. 
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b 


Figure 1.6: Lines of slope a/b corresponding to entries in Plimpton 322 


Some important trigonometry may be gleaned from Diophantus’s method if 
we compare the angle at O in Figure 1.4 with the angle at Q in Figure 1.5. The two 
angles are shown in Figure 1.7, and high school geometry shows that the angle at 
Q is half the angle at O. 


1.3.4 Why does the angle at Q equal 6/2? (Hint: consider angles in the red 
triangle.) 


1.3.5 Use Figure 1.7 to show that ¢ = tan g and 


4 1-7 - 2r 
cos @ = ——~, sind= 4 
1+7 1+7 


10 1 The Theorem of Pythagoras 


Figure 1.7: Angles in a circle 


1.4 Right-Angled Triangles 


It is high time we looked at the Pythagorean theorem from the traditional 
point of view, as a theorem about right-angled triangles; however, we will 
be rather brief about its proof. It is not known how the theorem was first 
proved, but probably it was by simple manipulations of area, perhaps sug- 
gested by rearrangement of floor tiles. Just how easy it can be to prove the 
Pythagorean theorem is shown by Figure 1.8, given by Heath (1925) in his 
edition of Euclid’s Elements, Vol. 1, p. 354. Each large square contains four 
copies of the given right-angled triangle. Subtracting these four triangles 
from the large square leaves, on the one hand (Figure 1.8, right), the sum 
of the squares on the two sides of the triangle. On the other hand (/eff), it also 
leaves the square on the hypotenuse. This proof, like the hundreds of others 
that have been given for the Pythagorean theorem, rests on certain geomet- 
ric assumptions. It is in fact possible to transcend geometric assumptions 
by using numbers as the foundation for geometry, and the Pythagorean the- 
orem then becomes true almost by definition, as an immediate consequence 
of the definition of distance (see Section 1.5). 

To the Greeks, however, it did not seem possible to build geometry on 
the basis of numbers, due to a conflict between their notions of number and 
length. In the next section we will see how this conflict arose. 
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Figure 1.8: Proof of the Pythagorean theorem 


EXERCISES 


A way to see the Pythagorean theorem in a tiled floor was suggested by Mag- 
nus (1974), p. 159, and it is shown in Figure 1.9. (The dotted squares are not tiles; 
they are a hint.) 


Figure 1.9: Pythagorean theorem in a tiled floor 


1.4.1 What has this figure to do with the Pythagorean theorem? 


Euclid’s first proof of the Pythagorean theorem, in Book I of the Elements, is 
also based on area. It depends only on the fact that triangles with the same base and 
height have equal area, though it involves a rather complicated figure. In Book VI, 
Proposition 31, he gives another proof, based on similar triangles (Figure 1.10). 


1.4.2 Show that the three triangles in Figure 1.10 are similar, and hence prove 
the Pythagorean theorem by equating ratios of corresponding sides. 
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Figure 1.10: Another proof of the Pythagorean theorem 


1.5 Irrational Numbers 


We have mentioned that the Babylonians, although probably aware of the 
geometric meaning of the Pythagorean theorem, devoted most of their atten- 
tion to the whole-number triples it had brought to light, the Pytha- 
gorean triples. Pythagoras and his followers were even more devoted to 
whole numbers. It was they who discovered the role of numbers in musical 
harmony: dividing a vibrating string in two raises its pitch by an octave, 
dividing in three raises the pitch another fifth, and so on. This great discov- 
ery, the first clue that the physical world might have an underlying math- 
ematical structure, inspired them to seek numerical patterns, which to them 
meant whole-number patterns, everywhere. Imagine their consternation 
when they found that the Pythagorean theorem led to quantities that were 
not numerically computable. They found lengths that were incommensu- 
rable, that is, not measurable as integer multiples of the same unit. The ratio 
between such lengths is therefore not a ratio of whole numbers, hence in 
the Greek view not a ratio at all, or irrational. 

The incommensurable lengths discovered by the Pythagoreans were 
the side and diagonal of the unit square. It follows immediately from the 
Pythagorean theorem that 


(diagonal)* = 1+ 1 =2. 


Hence if the diagonal and side are in the ratio m/n (where m and n can be 
assumed to have no common divisor), we have 


m [n? =2: 


whence 
m = 2n’. 
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The Pythagoreans were interested in odd and even numbers, so they proba- 
bly observed that the latter equation, which says that m’ is even, also implies 
that m is even, say m = 2p. But if 


m= 2p, 
then 
Qn? =m = An’: 
hence 
oe 2p?, 


which similarly implies that 1 is even, contrary to the hypothesis that m and 
n have no common divisor. (This proof is in Aristotle’s Prior Analytics. An 
alternative, more geometric, proof is mentioned in Section 3.4.) 

This discovery had profound consequences. Legend has it that the first 
Pythagorean to make the result public was drowned at sea (see Heath (1921), 
Vol. 1, pp. 65, 154). It led to a split between the theories of number and 
space that was not healed until the 19th century (if then, some believe). The 
Pythagoreans could not accept V2 as a number, but no one could deny that 
it was the diagonal of the unit square. Consequently, geometrical quantities 
had to be treated separately from numbers or, rather, without mentioning 
any numbers except rationals. Greek geometers thus developed ingenious 
techniques for precise handling of arbitrary lengths in terms of rationals, 
known as the theory of proportions and the method of exhaustion. 

As we will see in Chapter 4, these techniques made necessary use of 
infinity—something that the Greeks were very reluctant to do. 


The Reconciliation of Numbers with Geometry 


As we now know, it is not necessary to deny that V2 is a number, or to do 
geometry without applying the processes of arithmetic to lengths, areas, 
and volumes. In the 1620s, Fermat and Descartes realized that, if lengths 
are viewed as numbers, then each point P in the plane is given by an ordered 
pair (x, y) of numbers, called the coordinates of P. The coordinates x and 
y are respectively the horizontal and vertical distances of P from an origin 
O. We tell the story of their discovery, and the reasons for its success, in 
Chapter 6. 

In coordinate geometry one can define the distance between any two 
points, guided by none other than the Pythagorean theorem. If P; = (x1, y1) 
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and P») = (x2, yz) then the line P; Pz from P, to P2 is the hypotenuse of a 
triangle with horizontal side x2 — x; and vertical side y2 — y; (Figure 1.11). 


y 


A 


P (x2, y2) 


Yy2— Yi) 


Pi(x1,y1) 


O 


Figure 1.11: Distance via the Pythagorean theorem 


Since the square of the hypotenuse is the sum of the squares on the other 
two sides, 


(x2 - 1)? + (Yr - yw”, 


we should define 


length of P| Pz = \/(x2 — x1)? + (y2 — 1)’. 


It follows, for example, that the points (x, y) at distance 1 from O satisfy 
the equation x” + y” = 1, which we called the equation of the (unit) circle 
in Section 1.3. The coordinate geometry of Fermat and Descartes is part of 
what is now called algebraic geometry, a vast expansion of Greek geome- 
try. Algebraic geometry was made possible by 16th century discoveries in 
algebra, which brought the study of curves into alignment with the study 
of polynomial equations. 


A coordinate geometry closer in content to Greek geometry, particu- 
larly that of Euclid, was developed by Grassmann in the 1840s. Grass- 
mann’s geometry is part of what we now call linear algebra, and its key 
concept—the inner product—is also inspired by the Pythagorean theorem. 
For more on linear algebra and the inner product, see Section 16.2. 


1.5. Irrational Numbers 15 


EXERCISES 


The crucial step in the proof that V2 is irrational is showing that m? even 
implies m is even or, equivalently, that m odd implies m* odd. It is worth taking a 
closer look at why this is true. 


1.5.1 Writing an arbitrary odd number m in the form 2q + 1, for some integer gq, 
show that m? also has the form 2r + 1, which shows that m2? is also odd. 


You probably did some algebra like this in Exercise 1.2.3, but if not, here is 
your chance: 


1.5.2 Show that the square of 2g + 1 is in fact of the form 4s + 1, and hence 
explain why every integer square leaves remainder 0 or 1 on division by 4. 


® 


Check for 
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Greek Geometry 


PREVIEW 


Geometry was the first branch of mathematics to become highly devel- 
oped. The concepts of “theorem” and “proof” originated in geometry, and 
most mathematicians until recent times were introduced to their subject 
through the geometry in Euclid’s Elements. 

In the Elements one finds the first system for deriving theorems from 
supposedly self-evident statements called axioms. Euclid’s axioms are 
incomplete and one of them, the so-called parallel axiom, is not as obvi- 
ous as the others. Nevertheless, it took over 2000 years to produce a clearer 
foundation for geometry. 

The climax of the Elements is the investigation of the regular poly- 
hedra, five symmetric figures in three-dimensional space. The five regular 
polyhedra make several appearances in mathematical history, most impor- 
tantly in the theory of symmetry—group theory—discussed in Chapter 14. 

The Elements contains not only proofs but also many constructions, 
by ruler and compass. However, three constructions are conspicuous by 
their absence: duplication of the cube, trisection of the angle, and squaring 
the circle. These problems were not properly understood until the 19th 
century, when they were resolved (in the negative) by algebra and analysis. 

The only curves in the Elements are circles, but the Greeks studied 
many other curves, such as the conic sections. Again, many problems that 
the Greeks could not solve were later clarified by algebra. In particular, 
curves can be classified by degree, and the conic sections are the curves of 
degree 2, as we will see in Chapter 6. 
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2.1 The Deductive Method 


He was 40 years old before he looked on Geometry; which 
happened accidentally. Being in a Gentleman’s Library, Euclid’s 
Elements lay open, and ’twas the 47 El. libri I. He read the 
Proposition. By G——sayd he (he would now and then sweare 
an emphaticall Oath by way of emphasis) this is impossi- 
ble! So he reads the Demonstration of it, which referred him 
back to such a Proposition; which proposition he read. That 
referred him back to another, which he also read .. . that at last 
he was demonstratively convinced of that trueth. This made 
him in love with Geometry. 


This quotation about the philosopher Thomas Hobbes (1588-1679), 
from Aubrey’s Brief Lives, beautifully captures the force of Greece’s most 
important contribution to mathematics, the deductive method. (The propo- 
sition mentioned, incidentally, is the Pythagorean theorem.) 

We have seen that significant results were known before the period of 
classical Greece, but the Greeks were the first to find results by deduction 
from previously established results, resting ultimately on the most evident 
possible statements, called axioms. Thales (624—547 sce) is thought to be 
the originator of this method (see Heath (1921), p. 128), and by 300 BcE 
Euclid’s Elements set the standard for mathematical rigor until the 19th 
century. But the Elements is difficult, so in time it was boiled down to 
the simplest and driest propositions about lines, angles, and circles. These 
propositions are based on the following axioms (in the translation of Heath 
(1925), p. 154), which Euclid called postulates and common notions. 


Postulates 


Let the following be postulated: 

. To draw a straight line from any point to any point. 

. To produce a finite straight line continuously in a straight line. 
. To describe a circle with any center and distance. 


. That all right angles are equal to one another. 


nF WN 


. That, if a straight line falling on two straight lines make the interior angles 
on the same side less than two right angles, the two straight lines, if pro- 
duced indefinitely, meet on that side on which are the angles less than the 
two right angles. 
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Common Notions 


. Things which are equal to the same thing are also equal to one another. 


. If equals be added to equals, the wholes are equal. 


1 

2 

3. If equals be subtracted from equals, the remainders are equal. 

4. Things which coincide with one another are equal to one another. 
5 


. The whole is greater than the part. 


It appears that Euclid’s intention was to deduce geometric propositions 
from visually evident statements (the postulates) using evident principles 
of logic (the common notions). Actually, he often made unconscious use of 
visually plausible assumptions that are not among his postulates. His very 
first proposition used the unstated assumption that two circles meet if the 
center of each is on the circumference of the other (Heath (1925), p. 242). 
Nevertheless, such flaws were not noticed until the 19th century, and they 
were rectified by Hilbert (1899). By themselves, they probably would not 
have been enough to end the Elements’ run of 22 centuries as a leading 
textbook. The Elements was overthrown by more serious mathematical 
upheavals in the 19th century. The so-called non-Euclidean geometries, 
using alternatives to Euclid’s fifth postulate (the parallel axiom), devel- 
oped to the point where the old axioms could no longer be considered 
self-evident (see Chapter 13). At the same time, the concept of number 
matured to the point where irrational numbers became acceptable, and 
indeed preferable to intuitive geometric concepts, in view of the doubts 
about what the self-evident truths of geometry really were. 

The outcome was a more adaptable language for geometry in which 
“points,” “lines,” and so on, could be defined, usually in terms of numbers, 
so as to suit the type of geometry under investigation. Such a develop- 
ment was long overdue. Even in Euclid’s time the Greeks were investigat- 
ing curves more complicated than circles, which did not fit conveniently 
in Euclid’s system. Descartes (1637) introduced the coordinate method, 
which gives a single framework for handling both Euclid’s geometry and 
higher curves (see Chapter 6), but it was not at first realized that coordi- 
nates allowed geometry to be entirely rebuilt on numerical foundations. 

The comparatively trivial step (for us) of passing to axioms about num- 
bers from axioms about points had to wait until the 19th century, when 
geometric axioms about points lost authority and number-theoretic axioms 
gained it. We say about these developments later (and of problems with the 
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authority of axioms in general, which arose in the 20th century). For the 
remainder of this chapter we will look at some important nonelementary 
topics in Greek geometry, using the coordinate framework where conve- 
nient. 


EXERCISES 


Euclid’s Common Notions | and 4 define what we now call an equivalence 
relation, which is not necessarily the equality relation. In fact, the kind of relation 
Euclid had in mind was equality in some geometric quantity such as length or 
angle (but not necessarily equality in all respects—the latter is what he meant by 
“coinciding’’). An equivalence relation = is normally defined by three properties. 
For any a, b and c: 


a=a, (reflexive) 
ab => bea, (symmetric) 
a=bandb2=c > a#=c. (transitive) 


2.1.1 Explain how Common Notions | and 4 may be interpreted as the transitive 
and reflexive properties. Note that the natural way to write Common Notion 
1 symbolically is slightly different from the statement of transitivity above. 


2.1.2 Show that the symmetric property follows from Euclid’s Common Notions 
1 and 4. 


Hilbert (1899) took advantage of Euclid’s Common Notions | and 4 in his 
rectification of Euclid’s axiom system. He defined equality of length by postulat- 
ing a transitive and reflexive relation on line segments, and stated transitivity in 
the style of Euclid, so that the symmetric property was a consequence. 


2.2 The Regular Polyhedra 


Greek geometry is virtually complete as far as the elementary properties of 
plane figures are concerned. It is fair to say that only a handful of interest- 
ing elementary propositions about triangles and circles have been discov- 
ered since Euclid’s time. Solid geometry is much more challenging, even 
today, so it is understandable that it was left in a less complete state by the 
Greeks. Nevertheless, they made some very impressive discoveries and 
managed to complete one of the most beautiful chapters in solid geom- 
etry, the enumeration of the regular polyhedra. The five possible regular 
polyhedra are shown in Figure 2.1. (Images courtesy of Wikimedia.) 
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Figure 2.1: Tetrahedron, cube, octahedron, dodecahedron, icosahedron 


Each polyhedron is convex and is bounded by a number of congruent 
polygonal faces, the same number of faces meet at each vertex, and in each 
face all the sides and angles are equal, hence the term regular polyhedron. 
A regular polyhedron is a spatial figure analogous to a regular polygon in 
the plane. But whereas there are regular polygons with any number n > 3 
of sides, there are only five regular polyhedra. 


This fact is easily proved and may go back to the Pythagoreans (see, 
for example Heath (1921), p. 159). One considers the possible polygons 
that can occur as faces, their angles, and the numbers of them that can 
occur at a vertex. For a 3-gon (triangle) the angle is 2/3, so three, four, or 
five can occur at a vertex, but six cannot, as this would give a total angle 
2x and the vertex would be flat. For a 4-gon the angle is 2/2, so three can 
occur at a vertex, but not four. For a 5-gon the angle is 37/5, so three can 
occur at a vertex, but not four. For a 6-gon the angle is 27/3, so not even 
three can occur at a vertex. But at least three faces must meet at each ver- 
tex of a polyhedron, so 6-gons (and, similarly, 7-gons, 8-gons, ...) cannot 
occur as faces of a regular polyhedron. This leaves only the five possibili- 
ties just listed, which correspond to the five known regular polyhedra. 


But do these five really exist? There is no trouble constructing the 
tetrahedron, cube, or octahedron, but it is not clear that, say, 20 equilateral 
triangles will fit together to form a closed surface. Euclid found this prob- 
lem difficult enough to be placed near the end of the Elements, and few 
of his readers ever mastered his solution. A beautiful direct construction 
was given by Luca Pacioli, a friend of Leonardo da Vinci's, in his book De 
divina proportione (1509). Pacioli’s construction uses three copies of the 
golden rectangle, with sides 1 and (1 + v5)/2, interlocking as in Figure 
2.2. The 12 vertices define 20 triangles such as ABC, and it suffices to 
show that these are equilateral, that is, AB = 1. This is a straightforward 
exercise in the Pythagorean theorem (Exercise 2.2.2). 
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Figure 2.2: Pacioli’s construction of the icosahedron 


The regular polyhedra will make another important appearance in yet 
another 19th-century development, the theory of finite groups and Galois 
theory. See Chapter 14. Before the regular polyhedra made this triumphant 
comeback, they also took part in a famous fiasco: the Kepler (1596) the- 
ory of planetary distances. Kepler’s theory is summarized by his famous 
diagram (Figure 2.3) of the five polyhedra, nested in such a way as to pro- 
duce six spheres of radii proportional to the distances of the six planets 
then known. Unfortunately, although mathematics could not permit any 
more regular polyhedra, nature could permit more planets, and Kepler’s 
theory was ruined when Uranus was discovered in 1781. 


EXERCISES 


The ratios between successive radii in Kepler’s construction depend on what 
may be called the inradius and circumradius of each polyhedron—the radii of the 
spheres that touch it on the inside and the outside. It happens that the ratio 


circumradius 
inradius 


is the same for the cube and the octahedron, and it is also the same for the dodec- 
ahedron and the icosahedron. This implies that the cube and octahedron can be 
exchanged in Kepler’s construction, as can the dodecahedron and the icosahe- 
dron. Thus there are at least four different arrangements of the regular polyhedra 
that yield the same sequence of radii. 

It is easy to see why the cube and the octahedron are interchangeable. 
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Figure 2.3: Kepler’s diagram of the polyhedra 


2.2.1 Show that circumradius _ V3 for both the cube and the octahedron. 
inradius 


To compute circumradius/inradius for the icosahedron and the dodecahedron 
is quite difficult, and we will not pursue it further, other than verifying that Paci- 
oli’s construction gives a figure bounded by equilateral triangles. 


2.2.2 Check Pacioli’s construction: use the Pythagorean theorem to show that 
AB = BC = CA in Figure 2.2. (It may help to use the additional fact that 
t =(1+ ¥5)/2 satisfies 72 = 7 + 1.) 


2.3. Ruler and Compass Constructions 


Greek geometers prided themselves on their logical purity; nevertheless, 
they were guided by intuition about physical space. One aspect of Greek 
geometry that was peculiarly influenced by physical considerations was 
the theory of constructions. Much of the elementary geometry of straight 
lines and circles can be viewed as the theory of constructions by ruler and 
compass. (By a “ruler” we mean simply a straightedge; it is not assumed 
to have any marks on it.) The very subject matter, lines and circles, reflects 
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the instruments used to draw them. And many of the elementary problems 
of geometry—for example, to bisect a line segment or angle, construct a 
perpendicular, or draw a circle through three given points—can be solved 
by ruler and compass constructions. 

When coordinates are introduced, it is not hard to show that the points 
constructible from points P,,..., P, have coordinates in the set of numbers 
generated from the coordinates of P;,...,P, by the operations +, —, x, +, 
and ¥ (see Moise (1963) or the exercises to Section 5.3). Square roots 
arise, of course, because of the Pythagorean theorem: if points (a, b) and 
(c,d) have been constructed, then so has the distance /(c — a)? + (d — b?? 
between them (Section 1.5). Conversely, it is possible to construct Vi for 
any given length / (Exercise 2.3.2). 

Seen from this viewpoint, ruler and compass constructions look very 
special and unlikely to yield numbers such as V2, for example. Just this 
number comes up in the classical Greek problem called duplication of the 
cube, since doubling the volume of a cube amounts to multiplying its side 
V2. Other notorious problems were trisection of the angle and squaring 
the circle. The latter problem was to construct a square equal in area to 
a given circle or to construct the number z, which amounts to the same 
thing. They sought ruler and compass solutions, though the possibility of 
a negative solution was admitted and solutions by less elementary means 
were tolerated. We will see some of these in the next sections. 

The impossibility of solving these problems by ruler and compass con- 
structions was not proved until the 19th century. For the duplication of 
the cube and trisection of the angle, impossibility was shown by Wantzel 
(1837). Wantzel seldom receives credit for settling these problems, which 
had baffled the best mathematicians for 2000 years, perhaps because his 
methods were superseded by the more powerful theory of algebraic num- 
bers (see Chapter 16). 

The impossibility of squaring the circle was proved by Lindemann 
(1882), in a very strong way. Not only is z undefinable by rational opera- 
tions and square roots; it is also transcendental, that is, not the root of any 
polynomial equation with rational coefficients. Like Wantzel’s work, this 
was a rare example of a major result proved by a minor mathematician. In 


'The term “squaring,” or its Latin equivalent “quadrature,” later became a general term 
for finding the area of curved regions, particularly in the 17th century, when calculus solved 
many such problems. Since ancient times the “squaring the circle” has been a popular 
phrase for trying to do the impossible. 
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Lindemann’s case the explanation is perhaps that a major step had already 
been taken when Hermite (1873) proved the transcendence of e. Accessi- 
ble proofs of both these results can be found in Klein (1924). Lindemann’s 
subsequent career was mathematically undistinguished, even embarrass- 
ing. In response to skeptics who thought his success with a had been a 
fluke, he took aim at the most famous unsolved problem in mathematics, 
“Fermat’s last theorem” (see Chapter 10 for the origin of this problem). 
His efforts fizzled out in a series of inconclusive papers, each one correct- 
ing an error in the one before. Fritsch (1984) has written an interesting 
biographical article on Lindemann. 

One ruler and compass problem is still open: which regular n-gons are 
constructible? Gauss discovered in 1796 that the 17-gon is constructible 
and then showed that a regular n-gon is constructible if and only if n = 
2” Pi p2°** Pe, where the p; are distinct primes of the form a” 4-1, (This 
problem is also known as circle division, because it is equivalent to divid- 
ing the circumference of a circle, or the angle 27, into n equal parts.) The 
proof of necessity was actually completed by Wantzel (1837). However, it 
is still not explicitly known what these primes are, or even whether there 
are infinitely many of them. The only ones known are for h = 0, 1, 2,3, 4. 


EXERCISES 


Many of the constructions made by the Greeks are simplified by translating 
them into algebra, where it turns out that constructible lengths are those that can 
be built from known lengths by the operations of +, —, x, +, and V. It is there- 
fore enough to know constructions for these five basic operations. Addition and 
subtraction are obvious, and the other operations are covered in the following 
exercises, together with an example in which algebra is a distinct advantage. 


2.3.1 Show, using similar triangles, that if lengths /; and Jy are constructible, then 
so are [,/> and 1, /h. 


2.3.2 Use similar triangles to explain why -/ is the length shown in Figure 2.4, 
and hence show that V/ is constructible from /. 


Vi 


Figure 2.4: Square root construction 
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One of the finest ruler and compass constructions from ancient times is that of 
the regular pentagon, which includes, yet again, the golden ratio t = (1 + V5)/2. 
Knowing (from the questions above) that this number is constructible, it becomes 
easy for us to construct the pentagon itself. 


2.3.3 By finding some parallels and similar triangles in Figure 2.5, show that the 
diagonal x of the regular pentagon of side | satisfies x/1 = 1/(x - 1). 


1 


Figure 2.5: The regular pentagon 


2.3.4 Deduce from Exercise 2.3.3 that the diagonal of the pentagon is (1 + ¥5)/2 
and hence that the regular pentagon is constructible. 


2.4 Conic Sections 


Conic sections are the curves obtained by cutting a circular cone by a 
plane: ellipses (including circles), parabolas, and hyperbolas (Figure 2.6, 
left to right). Today we know the conic sections better by their equations: 


Pia 2 : 

” + a = 1, (ellipse) 
y= ax’, (parabola) 
er ¥ 

ig = 1. (hyperbola) 


More generally, any second-degree equation represents a conic section or 
a pair of straight lines, a result that was proved by Descartes (1637). 

The names “ellipse,” “parabola”, and “hyperbola” come from the 
Greek, meaning roughly “too little,’ “alongside,” and “too much.’ The 
ellipse arises by cutting with a plane that slopes too little (to make an infi- 
nite curve), the parabola from a plane parallel to one side of the cone, and 
the hyperbola from a plane that slopes too much to avoid hitting the other 


part of the cone, so it produces a curve with two branches. 
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Figure 2.6: Ellipse, parabola, hyperbola 


The invention of conic sections is attributed to Menaechmus (fourth 
century BCE), a contemporary of Alexander the Great. Alexander is said to 
have asked Menaechmus for a crash course in geometry, but Menaechmus 
refused, saying, “There is no royal road to geometry.” Menaechmus used 
conic sections to give a very simple solution to the problem of duplicating 
the cube. In algebraic notation, this can be described as finding the inter- 
section of the parabola y = 5x? with the hyperbola xy = 1. This yields 


1 
xe =1 or x =2. 


The theory and practice of conic sections finally came together when 
Kepler (1609) found the orbits of the planets to be ellipses, and Newton 
(1687) explained this fact by his law of gravitation. This wonderful vindi- 
cation of the theory of conic sections has often been seen as basic research 
receiving its long overdue reward, but perhaps one can also see it as a 
rebuke to Greek disdain for applications. As for Kepler himself ...to the 
end of his days he was proudest of his theory explaining the distances of 
the planets in terms of the five regular polyhedra (Section 2.2). 
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EXERCISES 


A key feature of the ellipse for both geometry and astronomy is a point called 
the focus. The term is the Latin word for fireplace, and it was introduced by 
Kepler. The ellipse actually has two foci, and they have the geometric property 
that the sum of the distances from the foci F,, F2 to any point P on the ellipse is 
constant. 


2.4.1 This property gives a way to draw an ellipse using two pins and piece of 
string. Explain how. 


2.4.2 By introducing suitable coordinate axes, show that a curve with the above 
“constant sum” property indeed has an equation of the form 


(It is a good idea to start with the two square root terms, representing the 
distances F, P and F2P, on opposite sides of the equation.) Show also that 
any equation of this form is obtainable by suitable choice of F,, F2, and 
F\P + FoP. 


Another interesting property of the lines from the foci to a point P on the 
ellipse is that they make equal angles with the tangent at P. It follows that a light 
ray from F; to P is reflected through F. A simple proof of this can be based on 
the shortest-path property of reflection, shown in Figure 2.7 and discovered by the 
Greek scientist Heron around 100 ce. 


Fy 


Fy 
Fy 


Figure 2.7: The shortest-path property 


Shortest-path property. The path F, PF> of reflection in the line L from F to 
Fis shorter than any other path F, P’ F, from F| to L to Fp. 


2.4.3 Prove the shortest-path property, by considering the two paths F | PF and 
FP’ F, where F> is the reflection of the point F>2 in the line L. 
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Thus to prove that the lines F; P and F2P make equal angles with the tangent, 
it is enough to show that F'| PF. is shorter than FP’ F’, for any other point P’ on 
the tangent at P. 


2.4.4 Prove this, using the fact that F'; PF’ has the same length for all points P 
on the ellipse. 


Kepler’s great discovery was that the focus is also significant in astronomy. 
One focus is the point occupied by the sun as the planet moves along its ellipse. 


2.5 Higher-Degree Curves 


The Greeks lacked a systematic theory of higher-degree curves, because 
they lacked a systematic algebra. They could find what amounted to carte- 
sian equations (in words) of individual curves—“symptoms,” as they called 
them; see van der Waerden (1954), p. 241—but they did not consider equa- 
tions in general or notice any of their properties relevant to the study of 
curves, for example, the degree. Nevertheless, they studied many inter- 
esting special curves, which Descartes and his followers cut their teeth on 
when algebraic geometry finally emerged in the 17th century. An excellent 
and well-illustrated account of these early investigations may be found in 
Brieskorn and Knorrer (1981), Chapter 1. 

In this section we must confine ourselves to brief remarks on a few 
examples. 


The Cissoid of Diocles (around 100 BcE) 

This curve is defined using an auxiliary circle, which for convenience 
we take to be the unit circle, and vertical lines through x and —x. It consists 
of all the points P seen in Figure 2.8. 

The portion shown in red results from varying x between 0 and 1. It is 
a cubic curve with cartesian equation 


y(1+x)=(1-x). 


This equation shows that if (x, y) is a point on the curve, then so is (x, —y). 
Hence one gets the complete picture of it by reflecting the portion shown 
in Figure 2.8 in the x-axis. The result is a sharp point at R, a cusp, a 
phenomenon that first arises with cubic curves. Diocles showed that the 
cissoid could be used to duplicate the cube, which is plausible (though 
still not obvious!) once one knows that this curve is cubic. 
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Figure 2.8: Construction of the cissoid 


The Spiric Sections of Perseus (around 150 Bce) 

Apart from the sphere, cylinder, and cone—whose sections are all conic 
sections—one of the few surfaces studied by the Greeks was the torus. 
This surface, generated by rotating a circle about an axis outside the cir- 
cle, but in the same plane, was called a spira by the Greeks—hence the 
name spiric sections for the sections by planes parallel to the axis. These 
sections, which were first studied by Perseus, have four qualitatively dis- 
tinct forms (see Figure 2.9). 

These forms—convex ovals, “squeezed” ovals, the figure 8, and pairs 
of ovals—were rediscovered in the 17th century when analytic geometers 
looked at curves of degree 4, of which the spiric sections are examples. 
For suitable choice of torus, the figure 8 curve becomes the /emniscate 
of Bernoulli and the convex ovals become Cassini ovals. Cassini (1625— 
1712) was a distinguished astronomer but an opponent of Newton’s theory 
of gravitation. He rejected Kepler’s ellipses and instead proposed Cassini 
ovals as orbits for the planets. 
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ane 


Figure 2.9: Spiric sections 


The Epicycles of Ptolemy (140 cr) 


These curves are known from a famous astronomical work, the Almagest 
of Claudius Ptolemy. Ptolemy himself attributes the idea to Apollonius. 
It seems almost certain that this is the Apollonius who mastered conic 
sections, which is ironic, because epicycles were his candidates for the 
planetary orbits, destined to be defeated by those very same conic sections. 


An epicycle, in its simplest form, is the path traced by a point on a cir- 
cle that rolls on another circle (Figure 2.10). More complicated epicycles 
can be defined by having a third circle roll on the second, and so on. The 
Greeks introduced these curves to try to reconcile the complicated move- 
ments of the planets, relative to the fixed stars, with a geometry based on 
the circle. In principle, this is possible! Lagrange (1772) showed that any 
motion along the celestial equator can be approximated arbitrarily closely 
by epicylic motion, and a more modern version of the result may be found 
in Sternberg (1969). But Ptolemy’s mistake was to accept the apparent 
complexity of the motions of the planets as actual in the first place. As we 
now know, the motion becomes simple when one considers motion relative 
to the sun rather than to the earth and allows orbits to be ellipses. 
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Figure 2.10: Generating an epicycle 


Epicycles still have a role to play in engineering, and their mathemat- 
ical properties are interesting. Some of them are closed curves and turn 
out to be algebraic, that is, of the form p(x,y) = 0 for a polynomial p. 
Others, such as those that result from rolling circles whose radii have an 
irrational ratio, lie densely in a certain region of the plane and hence can- 
not be algebraic; an algebraic curve p(x, y) = 0 can meet a straight line 
y = mx +c in only a finite number of points, corresponding to roots of the 
polynomial equation p(x, mx + c) = 0, and the dense epicycles meet some 
lines infinitely often. 

An obvious relative of the epicycles is the cycloid, the curve traced by 
a point on a circle that rolls on a straight line. The cycloid does not seem to 
have been studied by the Greeks, but it became a favorite of 17th-century 
mathematicians. As we will see in Chapter 13, spectacular properties of 
the cycloid were revealed by the methods of calculus. 


EXERCISES 
The equation of the cissoid is derivable as follows. 


2.5.1 Using X and Y for the horizontal and vertical coordinates, show that the 
straight line RP in Figure 2.8 has equation 


V1 — x2 


1l+x 


Y= (X= 1), 


2.5.2. Deduce the equation of the cissoid from Exercise 2.5.1. 
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The simplest epicyclic curve is the cardioid (“heart-shape”’), which results 
from a circle rolling on a fixed circle of the same size. 


2.5.3 Sketch a picture of the cardioid, confirming that it is heart-shaped (sort of). 


2.5.4 Show that if both circles have radius 1, and we follow the point on the 
rolling circle initially at (1, 0), then the cardioid it traces out has parametric 
equations 


x = 2cos 6 —- cos 26, 


y = 2 sind — sin 20. 


The cardioid is an algebraic curve. Its cartesian equation may be hard to dis- 
cover, but it is easy to verify, especially if one has a computer algebra system. 


2.5.5 Check that the point (x, y) on the cardioid satisfies 


(+y -1% =4(x- 1) +). 


® 


Check for 
updates 
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Greek Number Theory 


PREVIEW 


Number theory is the second large field of mathematics that comes to us 
from the Pythagoreans via Euclid. The Pythagorean theorem led mathe- 
maticians to the study of squares and sums of squares; Euclid drew atten- 
tion to the primes by proving that there are infinitely many of them. 

His investigations were based on the Euclidean algorithm, a method 
for finding the greatest common divisor of two natural numbers. Common 
divisors are the key to basic results about prime numbers, in particular 
unique prime factorization, which says that each natural number factors 
into primes in exactly one way. 

Another discovery of the Pythagoreans, the irrationality of V2, has 
consequences for natural numbers. Since V2 # m/n for any natural num- 
bers m,n, there is no integer solution of the equation x* — 2y* = 0. But 
there are integer solutions of x* — 2y* = 1, and in fact infinitely many of 
them. The same is true of the equation x? — Ny* = 1 for any nonsquare 
natural number NV. 

The latter equation, called Pell’s equation, is perhaps second in fame 
only to the Pythagorean equation x” + y” = z”, among equations for which 
integer solutions are sought. Equations for which integer or rational solu- 
tions are sought are called Diophantine, after Diophantus. The methods he 
used to solve quadratic and cubic Diophantine equations are still of inter- 
est. We study his method for cubics in this chapter, and take it up again in 
Chapter 10. 
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3.1 The Role of Number Theory 


In Chapter 1 we saw that number theory has been part of mathematics 
for at least as long as geometry, and from a foundational point of view it 
may be more important. Despite this, number theory resists a systematic 
treatment like that undergone by elementary geometry in Euclid’s Ele- 
ments. At all stages in its development, number theory has had glaring gaps 
because of the intractability of elementary problems. Most of the really 
old unsolved problems in mathematics, in fact, are simple questions about 
the natural numbers 1,2,3,.... The nonexistence of a general method for 
solving Diophantine equations (Section 1.3) and the problem of identify- 
ing the primes of the form 2?" + 1 (Section 2.3) have been noted. Other 
unsolved number theory problems will come up in the sections below. 

As a consequence, the role of number theory in the history of mathe- 
matics has been quite different from that of geometry. Geometry has played 
a stabilizing and unifying role, to the point of retarding further develop- 
ment at times and creating the popular impression that mathematics is a 
static subject. Number theory has been a spur to progress and change. 
Before 1800, only a handful of mathematicians contributed to advances 
in number theory, but they include some of the greats—Diophantus, Fer- 
mat, Euler, Lagrange, and Gauss. This book stresses advances in number 
theory that sprang from its connections with other parts of mathematics, 
particularly algebra and geometry, since these were the most significant for 
mathematics as a whole. For this reason we have no other chapter devoted 
purely to number theory, but there will be frequent excursions into number 
theory when we discuss algebra and what are called elliptic curves. 


3.2 Polygonal, Prime, and Perfect Numbers 


The polygonal numbers, which were studied by the Pythagoreans, result 
from a naive transfer of geometric ideas to number theory. From Figure 3.1 
it is easy to calculate an expression for the mth n-gonal number as the sum 
of acertain arithmetic series (Exercise 3.2.3) and to show, for example, that 
a square is the sum of two triangular numbers. Apart from Diophantus’s 
work, which contains impressive results on sums of squares, Greek results 
on polygonal numbers were of this elementary type. 

On the whole, the Greeks seem to have been mistaken in attaching 
much importance to polygonal numbers. There are no major theorems about 
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Triangular numbers 
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Figure 3.1: Polygonal numbers 


them, except perhaps the following two. The first is the theorem conjec- 
tured by Bachet de Méziriac (1621) that every positive integer is the sum of 
four integer squares. This was proved by Lagrange (1770). A generaliza- 
tion, which Fermat (1670) stated without proof, is that every positive inte- 
ger is the sum of n n-agonal numbers. This was proved by Cauchy (1813) 
but, somewhat disappointingly, all but four of the numbers can be 0 or 1. 
A short proof of Cauchy’s theorem has been given by Nathanson (1987). 
The other remarkable theorem about polygonal numbers is the formula 


| [a —x")=1+ > Cia? + xGP +8) /25 


n=1 k=1 


proved by Euler (1750) and known as Euler’s pentagonal number theorem, 
since the exponents (3k? — k)/2 are pentagonal numbers. For a proof see 
Hall (1967), p. 33. 
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The four-square theorem and the pentagonal number theorem were 
both absorbed around 1830 into Jacobi’s theory of theta functions, a much 
larger theory. Theta functions are related to the elliptic functions that we 
study in Chapter 10. 

The prime numbers were also considered within the geometric frame- 
work, as the numbers with no rectangular representation. A prime number, 
having no divisors apart from itself and 1, has only a “linear” representa- 
tion. Of course this is merely a restatement of the definition of prime, and 
most theorems about prime numbers require much more powerful ideas; 
however, the Greeks did come up with one gem. This is the proof that there 
are infinitely many primes, in Book IX of Euclid’s Elements. 

Given any finite collection of primes p1, p2,..., Pn, we can find another 


by considering P=Pip2-:*pntl. 


This number is not divisible by p1, p2,..., Pn (each leaves remainder 1). 
Hence either p itself is a prime, and p > pj, po,..., Pn, or else it has a 
prime divisor # Pj, P2,.--,Dn- 

A perfect number is one that equals the sum of its divisors (including 
1 but excluding itself). For example, 6 = 1+ 2 +3 is a perfect number, as 
is 28 = 1+2+4+7+ 14. The concept goes back to the Pythagoreans, 
but only two notable theorems about perfect numbers are known. Euclid 
concludes Book IX of the Elements by proving that if 2” — 1 is prime, then 
2"-1(2"—1) is perfect (Exercise 3.2.5). These perfect numbers are of course 
even, and Euler (1849) (a posthumous publication) proved that every even 
perfect number is of Euclid’s form. Euler’s surprisingly simple proof may 
be found in Burton (1985), p. 504. It is unknown whether odd perfect num- 
bers exist—this may be the oldest open problem in mathematics. 

In view of Euler’s theorem, all even perfect numbers arise from primes 
of the form 2” — 1. These are known as Mersenne primes, after Marin 
Mersenne (1588-1648), who first drew attention to the problem of find- 
ing primes of this form. It is not known whether there are infinitely many 
Mersenne primes, though larger and larger ones seem to be found quite reg- 
ularly. In recent years each new world-record prime has been a Mersenne 
prime, giving a corresponding world-record perfect number. 


EXERCISES 


Infinitely many natural numbers are not sums of three (or fewer) squares. 
The smallest of them is 7, and it can be shown as follows that no number of the 
form 8n +7 is a sum of three squares. 
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3.2.1 Show that any square leaves remainder 0, 1, or 4 on division by 8. 


3.2.2 Deduce that a sum of three squares leaves remainder 0, 1, 2, 3, 4, 5, or 6 
on division by 8. 


One reason polygonal numbers play only a small role in mathematics is that 
questions about them are basically questions about squares—hence the focus is 
on problems about squares. 


3.2.3 Show that the kth pentagonal number is (3k? — k)/2. 


3.2.4 Show that each square is the sum of two consecutive triangular numbers. 


Euclid’s theorem about perfect numbers depends on the prime divisor prop- 
erty, which will be proved in the next section. Assuming this for the moment, it 
follows that if 2”—1 is a prime p, then the proper divisors of 2”-' p (those unequal 
to 2”! p itself) are 


138 3 and. pp. ps2 ps. 


3.2.5 Given that the divisors of 2""'p are those just listed, show that 2”~!p is 
perfect when p = 2” — | is prime. 


3.3. The Euclidean Algorithm 


This algorithm is named after Euclid because its earliest known appear- 
ance is in Book VII of the Elements. However, in the opinion of many 
historians (for example, Heath (1921), p.399) the algorithm and some of 
its consequences were probably known earlier. At the very least, Euclid 
deserves credit for a masterly presentation of the fundamentals of number 
theory, based on this algorithm. 

The Euclidean algorithm is used to find the greatest common divisor 
(gcd) of two positive integers a, b. The first step is to construct the pair 
(a, by), where 


a, = max(a, b) — min(a, b), 


b, = min(a, b), 


and then one simply repeats this operation of subtracting the smaller num- 
ber from the larger. That is, if the pair constructed at step i is (a;, b;), then 
the pair constructed at step i+ 1 is 


i+) = max(qj;, bj) — min(aj, bj), 


bis: = min(q;, bj). 
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The algorithm terminates at the first stage when aj;,; = b;4,, and this com- 
mon value is gcd(a,b). This is because taking differences preserves any 
common divisors; hence when a; = b),; we have 


gcd(a, b) = gced(a1,b1) = +++ = ged(aia1, Dis) = Gist = Dist. 


The sheer simplicity of the algorithm makes it easy to draw some important 
consequences. Euclid of course did not use our notation, but nevertheless 
he had results close to the following. 


1. If gcd(a, b) = 1, then there are integers m, n such that ma + nb = 1. 
The equations 


a, = max(a, b) — min(a, b), 
b, = min(a, b), 


i+) = max(q;, bj) — min(qj, bj), 

bia. = min(a;, bj) 
show first that a,,b; are integral linear combinations, ma + nb, of a 
and b, hence so are a, b2, hence so are a3, b3,..., and finally this is 
true of dj41 = Dj41. But aj4,; = bi4; = 1, since gcd(a, b) = 1; hence 
1 = ma+nb for some integers m, n. 


2. If p is a prime number that divides ab, then p divides a or b (the 
prime divisor property). 


To see this, suppose p does not divide a. Then, since p has no other 
divisors except 1, we have gcd(p,a) = 1. Hence by the previous 
result we get integers m,n such that 


ma+np = 1. 
Multiplying each side by b gives 
mab + nbp = b. 


By hypothesis, p divides ab; hence p divides both terms on the left- 
hand side, and therefore p divides the right-hand side b. 
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3. Each positive integer has a unique factorization into primes (the fun- 
damental theorem of arithmetic). 


Suppose on the contrary that some integer n has two different prime 
factorizations: 

N= Pipr*** Pj = 4192°** Tk. 
By removing common factors, if necessary, we can assume that there 
is a p; that is not among the q’s. But this contradicts the previous 
result, because p; divides n = qiqg2--- qx, yet it does not divide any 
of 41, q42,---,qx individually, since these are prime numbers # pj. 


Induction 


In this and the previous section we have glossed over an important point 
that Euclid was aware of but mentioned only briefly—the principle that 
an infinite decreasing sequence of positive integers is impossible. In the 
present section this infinite descent principle guarantees termination of the 
Euclidean algorithm, necessarily with the number gcd(a, b), for any pair 
of positive integers a,b. This is because the repeated subtraction process 
produces steadily decreasing numbers. 

In the previous section infinite descent played a hidden role in Euclid’s 
proof that there are infinitely many prime numbers: namely, in the assump- 
tion that some prime number divides p,p2--- p, + 1. In Proposition 31 of 
Book VII of his Elements, Euclid proves existence of a prime divisor of 
any number N by repeatedly splitting N into smaller factors. If this pro- 
cess does not arrive at a prime factor then we get an infinite sequence of 
positive integers, each smaller than the one before. As Euclid says, this is 
“impossible in numbers.” 

Today, the impossibility of infinite descent is one way of stating math- 
ematical induction (also known as complete induction), a method of proof 
that reflects the nature of positive integers as numbers that arise from 1 by 
repeatedly adding 1. On the one hand, this property implies that we arrive 
at 1 from any positive integer by stepping downward only finitely often. 
On the other hand, it implies that any positive integer can be reached from 
1 by finitely often adding 1. In particular, a property P can be proved to 
hold for all positive integers by proving 


1. P holds for the number | (the base step), 
2. If P holds for n, then P holds for n + 1 (the induction step). 
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“Base step, induction step” is often considered the standard form of proof 
by induction, but it is perfectly fair to say that proofs by infinite descent, 
such as Euclid’s, are also proofs by induction. 

Moreover, it is not generally appreciated that number theory needs 
induction as much as Euclid needed the parallel axiom in his geometry. 
The first to appreciate this fact was Grassmann (1861), who showed that 
all the basic algebraic properties of positive integers, such asa+b=b+a 
and ab = ba, can be proved by induction. Even then, Grassmann’s break- 
through was buried in a school textbook, and not brought into general 
mathematical consciousness until the 1880s, when Peano (1889) formu- 
lated an axiom system for arithmetic with an induction axiom at its core. 
This system, called Peano arithmetic or PA, is an important part of the 
foundations of mathematics, as we will see in Chapter 17. 


EXERCISES 


We can now fill the gap in the proof of Euclid’s theorem on perfect numbers 
(previous exercise set), using the prime divisor property. 


3.3.1 Use the prime divisor property to show that the proper divisors of 2”~! p, 
for any odd prime p, are 1,2,27,...,2”-! and p,2p,2?p...,2"*p. 


The result that if gcd(a,b) = 1 then 1 = ma + nb for some integers m and n 
is a special case of the following way to represent the gcd. 


3.3.2 Show that, for any integers a and b, there are integers m and n such that 
gcd(a, b) = ma + nb. 


This in turn gives a general way to find integer solutions of linear equations. 


3.3.3, Deduce from Exercise 3.3.2 that the equation ax + by = c with integer 
coefficients a, b, and c has an integer solution x, y if gcd(a, b) divides c. 


The converse of this result is also valid, as one discovers when considering 
a necessary condition for ax + by = c to have an integer solution. 


3.3.4 The equation 12x + 15y = 1 has no integer solution. Why? 


3.3.5 (Solution of linear Diophantine equations) Give a test to decide, for any 
given integers a, b, c, whether there are integers x, y such that 


ax + by =c. 
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3.4 Pell’s Equation 


The Diophantine equation x* — Ny* = 1, where N is a nonsquare integer, 
is known as Pell’s equation because Euler mistakenly attributed a solution 
of it to the 17th-century English mathematician Pell (it should have been 
attributed to Brouncker). Pell’s equation is probably the best-known Dio- 
phantine equation after the equation a” + b? = c’ for Pythagorean triples, 
and in some ways it is more important. Solving Pell’s equation is the main 
step in the solution of the general quadratic Diophantine equation in two 
variables (see, for example, Gelfond (1961)), and also a key tool in prov- 
ing the theorem of Matiyasevich mentioned in Section 1.3 that there is no 
algorithm for solving all Diophantine equations (see, for example, Davis 
(1973) or Jones and Matiyasevich (1991)). In view of this, it is fitting that 
Pell’s equation should make its first appearance in Greek mathematics, and 
it is impressive to see how well the Greeks understood it. 
The simplest instance of Pell’s equation, 


x —2y* =1, 


was studied by the Pythagoreans in connection with V2. If x,y are large 
solutions to this equation, then x/y ~ 2, and the Pythagoreans found 
they could generate larger and larger solutions by the recurrence relations 


Xn+1 = Xn + 2Yns 
Ynt1 = Xn + Yn- 


A short calculation shows that 


can a Yn = (Xx, a 2y;)s 


SO if (Xp, Yn) Satisfies oy = +1, then (Xn41, Yn+1) satisfies x2 =F. 
Starting with the trivial solution (x9, yo) = (1,0) of x —2y* = 1, we get 
successively larger solutions (x2, yz), (x4, ya), ... of x*—2y? = 1. (The pairs 
(Xn, Yn) Were known as side and diagonal numbers because the ratio y,/X, 
tends to that of the side and diagonal in a square.) 

But how might these recurrence relations have been discovered in the 
first place? Van der Waerden (1976) and Fowler (1980, 1982) suggest that 
the key is the Euclidean algorithm applied to line segments, an operation 
the Greeks called anthyphairesis. Given any two lengths a, b, one can 
define the sequence (a1, 1), (az, bz), ..., as in Section 3.2, by repeated sub- 
traction of the smaller length from the larger. If a, b are integer multiples 
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of some unit, then the process terminates as in Section 3.3, but if b/a is 
irrational, it continues forever. 


We can well imagine the Pythagoreans would have applied anthyphaire- 
sis to a = 1, b = V2. Here is what happens. If a, b are sides of a rectan- 
gle, each subtraction of the smaller number from the larger is represented 
by cutting off the square on the shorter side (Figure 3.2). We notice that 
the rectangle remaining after step 2, with sides V2 - 1 and 2- y2 = 
2( V2 — 1), is the same shape as the original, though the long side is 
now vertical instead of horizontal. It follows that similar steps will recur 
forever—which is another proof that V2 is irrational, incidentally. 


v2 1 v2-1 
step | 
1 > 
1 y2-1 
| 
N 
step 2 > 


= 92 


Figure 3.2: The Euclidean algorithm on v2 and 1 


Now, however, we are interested in the relation between successive 
similar rectangles. If we let the long and short sides of successive similar 
rectangles be X41, Yn41 and Xp, Yn, We can derive recurrence relations for 
Xn+1, Yn+i from Figure 3.3: 


Xn+1 = Xn + 2Yns 


Yntl = Xn + Yn. 


Exactly the relations of the Pythagoreans! The difference is that our xy, Y, 
are not integers, and they satisfy x* — 2y? = 0, not x? — 2y* = 1. 
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Yn 
Xn + Yn Yn 


Yn+1 
Xn 


Vv 


< Xn+1 > 
Figure 3.3: The recurrence relation 


Nevertheless, one feels that Figure 3.3 gives the most natural interpre- 
tation of these relations. The discovery that the same relations generate 
solutions of x* — 2y* = 1 possibly arose from wishing that the Euclidean 
algorithm terminated with x; = y; = 1. If the Pythagoreans started with 
X1 = y; = 1 and applied the recurrence relations, then they may well have 


found that (xp, yn) satisfies = 2y" = (-1)", as we did earlier. 


Many other instances of the Pell equation x” — Ny” = 1 occur in Greek 


mathematics. In the seventh century ce the Indian mathematician Brah- 
magupta gave a procedure for generating larger solutions of x* — Ny? = 1 
from known solutions. But existence of a solution, for any non-square N, 
was rigorously proved only in 1768 by Lagrange. The later European work 
on Pell’s equation, which began in the 17th century with Brouncker and 
others, was based on the continued fraction for VN , though this amounts 
to the same thing as anthyphairesis (see exercises). A short but detailed 
history of Pell’s equation is in Dickson (1920), pp. 341-400. 


An interesting aspect of the theory is the very irregular relationship 
between N and the number of steps before a rectangle proportional to the 
original recurs. If the number of steps is large, the smallest nontrivial solu- 
tion of x* — Ny* = 1 is enormous. A famous example is what is called the 
cattle problem of Archimedes (287-212 BcE), which leads to the equation 


x* — 4729494y? = 1. 
Its smallest solution was found by Krummbiegel and Amthor (1880) to 


have 206,545 digits! 


A recent paper on the cattle problem, Lenstra (2002), gives a strikingly 
condensed form of solution: “for the first time in history, a// infinitely many 
solutions to the cattle problem are displayed in a handy little table.” 
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EXERCISES 


The continued fraction of a real number a > 0 is written 


a@=nt 
ng + 


where 71, 72,73,N4,... are integers obtained by the following algorithm. Let 


n, = integer part of a. 

Then a — n, < 1 and a; = 1/(a—1n,) > 1, so we can take 
Ny = integer part of a). 

Then a; — 7 < 1 and a = 1/(a@; —m) > 1, so we can take 


n3 = integer part of a2, and so on. 


3.4.1 Apply the above algorithm to the number a = 157/68, and hence show 
that 
157 _ 1 


La) re 
7 al i 


You may notice that what happens is essentially the Euclidean algorithm applied 
to the pair (157, 68), except that repeated operations of subtraction are replaced 
by division with remainder. The integers 2,3,4,5 are the successive quotients 
obtained in these divisions: 157 divided by 68 gives quotient 2 and remainder 
21, 68 divided by 21 gives quotient 3 and remainder 5, and so on. 

Thus the Euclidean algorithm on integers a, b yields results that may be 
encoded by the (finite) continued fraction for a/b. This idea was introduced by 
Euler, and it became the preferred approach to the Euclidean algorithm for some 
mathematicians. Gauss (1801), in particular, always speaks of the Euclidean algo- 
rithm as the “continued fraction algorithm.” 

The Euclidean algorithm on a pair (a, 1), where a is irrational, is indeed 
better known as the continued fraction algorithm. 


3.4.2 Interpret the operations in the continued fraction algorithm—detaching the 
integer part and taking the reciprocal of the remainder—in terms of anthy- 
phairesis. 
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3.4.3 Show that 


3.4.4 Show that V3 +1 also has a periodic continued fraction, and hence derive 
the continued fraction for V3. 


3.5 The Chord and Tangent Methods 


In Section 1.3 we used a method of Diophantus to find all rational points on 
the circle. If p(x, y) = 0 is any quadratic equation in x and y with rational 
coefficients, and if the equation has one rational solution x = r},y = 51, 
then we can find any rational solution by drawing a rational line y = mx+c 
through the point 7), 5, and finding its other intersection with the curve 
p(x, y) = 0. The two intersections with the curve, x = 71,12, say, are given 
by the roots 7), 72 of the equation 


p(x,mx +c) =0. 


This means that p(x, mx+c) = k(x—r,)(x—r), and since all coefficients on 
the left-hand side are rational and r, is rational, then k and rz must also be 
rational. The y value when x = 72, y = s2 = mr2 +c, is rational since m and 
c are; hence (72, 52) is another rational point on p(x, y) = 0. Conversely, 
any line (or chord) through two rational points is rational, and hence all 
rational points are found in this way. 

Now if p(x, y) = 0 is a curve of degree 3, its intersections with a line 
y = mx +c are given by the roots of the cubic equation p(x, mx+c) = 0. If 
we know two rational points on the curve, then the line through them will 
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be rational, and its third intersection with the curve will also be rational, by 
an argument like the preceding one. This fact becomes more useful when 
one realizes that the two known rational points can be taken to coincide, 
in which case the line is the tangent through the known rational point. 
Thus from one rational solution we can generate another by the tangent 
construction, and from two we can construct a third by taking the chord 
between the two. 

Diophantus found rational solutions to cubic equations in what seems 
to have been essentially this way. The surviving works of Diophantus reveal 
little of his methods, but a plausible reconstruction—an algebraic version 
of the tangent and chord constructions—has been given by Bashmakova 
(1981). Probably the first to understand Diophantus’s methods was Fermat, 
in the 17th century, and the first to give the tangent and chord interpretation 
was Newton (1670s). 


Figure 3.4: Cubic curve y* = x° — 3x* +3x + 1 and tangent 


In contrast to the quadratic case, we have no choice in the slope of the 
rational line for cubics. Thus it is unclear whether this method will give all 
rational points on a cubic. A remarkable theorem, conjectured by Poincaré 
(1901) and proved by Mordell (1922), says that all rational points can 
be generated by tangent and chord constructions applied to finitely many 
points. However, it is still not known whether there is an algorithm for 
finding a finite set of such rational generators on each cubic curve. 
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EXERCISES 


3.5.1 Explain the solution x = 21/4, y = 71/8 to x — 3x? +3x4+1 = y’ given by 
Diophantus (Heath (1910), p. 242) by constructing the tangent through the 
obvious rational point on this curve (Figure 3.4). 

3.5.2 Rederive the following rational point construction of Viéte (1593), p. 145. 
Given the rational point (a, b) on x° — y? = a* — b*, show that the tangent 
at (a, b) is 


2 
a 
y = 53% a) +, 


and that the other intersection of the tangent with the curve is the rational 
point 
a? — 2b? b? - 2a° 
=a——_;; = b— —. 
e+b J a+b 


® 
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4 
Infinity in Greek Mathematics 


PREVIEW 


Perhaps the most interesting—and most modern—feature of Greek math- 
ematics is its treatment of infinity. The Greeks feared infinity and tried to 
avoid it, but in doing so they laid the foundations for a rigorous treatment 
of infinite processes in 19th century calculus. 

The most original contributions to the theory of infinity in ancient 
times were the theory of proportions and the method of exhaustion. Both 
were due to Eudoxus and expounded in Books V and XII of Euclid’s Ele- 
ments. 

The theory of proportions develops the idea that a “quantity” A (what 
we would now call a real number) can be known by its position among the 
rational numbers. That is, 2 is known if we know the rational numbers less 
than A and the rational numbers greater than J. In a sense, the space less 
than A can be “exhausted” by rational numbers. 

The method of exhaustion generalizes this idea from quantities to regions 
of the plane or space. A region becomes known (in area or volume) when 
its position among known areas or volumes is known. For example, we 
know the area of a circle when we know the areas of the polygons inside 
it and the areas of polygons outside it; we know the volume of a pyramid 
when we know the volumes of stacks of prisms inside it and outside it. 

Using this method, Euclid found that the volume of a tetrahedron 
equals 1/3 of its base area times its height, and Archimedes found the area 
of a parabolic segment. Both of them relied on an infinite process that is 
fundamental to many calculations of area and volume: the summation of 
an infinite geometric series. 
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4.1 Fear of Infinity 


Reasoning about infinity is one of the characteristic features of mathemat- 
ics as well as its main source of conflict. In Chapter 1 we saw the conflict 
that arose from the discovery of irrationals, and in this chapter we will see 
that the Greeks rejected not only irrational numbers but infinite processes 
in general. In fact, until the late 19th century most mathematicians were 
reluctant to accept infinity as more than “potential.” The infinitude of a 
process, collection, or magnitude was understood as the possibility of its 
indefinite continuation, and no more—certainly not the possibility of com- 
pletion. For example, the natural numbers 1,2,3,..., can be accepted as a 
potential infinity—generated from | by the process of repeatedly adding 
1—without accepting that there is a completed totality {1,2,3,...}. The 
same goes for any sequence x1, x2, x3,... (of rational numbers, say), where 
Xn+1 18 obtained from x, by a definite rule. 

And yet a beguiling possibility arises when x, tends to a limit x. If we 
already accept x—for geometric reasons, say—then it is tempting to view 
x as some kind of completion of the sequence x1, x2, x3,.... It seems that 
the Greeks were afraid to draw such conclusions. According to tradition, 
they were frightened off by the paradoxes of Zeno, around 450 sce. 

We know of Zeno’s arguments only through Aristotle, who quotes them 
in his Physics in order to refute them, and it is not clear what Zeno himself 
wished to achieve. Was there, for example, a tendency toward speculation 
about infinity that he disapproved of? His arguments are so extreme they 
could almost be parodies of loose arguments about infinity he heard among 
his contemporaries. Consider his first paradox, the dichotomy: 


There is no motion because that which is moved must arrive 
at the middle (of its course) before it arrives at the end. 


Aristotle, Physics, Book VI, Ch. 9 


The full argument presumably is that before getting anywhere one must 
first get half way, and before that a quarter of the way, and before that one 
eighth of the way, ad infinitum. The completion of this infinite sequence 
of steps no longer seems impossible to most mathematicians, since it rep- 
resents nothing more than an infinite set of points within a finite interval. 
It must have frightened the Greeks though, because in all their proofs they 
were very careful to avoid completed infinities and limits. 
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The first mathematical processes we would recognize as infinite may 
be due to the Pythagoreans, for example, the recurrence relations 


Xn+1 = Xn + 2Yn> 


Yn+1 = Xn + Yn 
for generating integer solutions of the equations x? — 2y” = +1. We saw 
in Section 3.4 why it is likely that these relations arose from an attempt to 
understand 2, and it is easy for us to see that x,,/y, > V2 asn > ~. 

However, it is unlikely that the Pythagoreans would have viewed V2 
as a limit or seen the sequence as a meaningful object. The most we can say 
is that, by stating a recurrence, the Pythagoreans implied a sequence with 
limit V2. Only a much later generation of mathematicians could accept 
the infinite sequence as such and appreciate its ability to define a limit. 

In a problem where we would reach a solution a by a limiting process, 
the Greeks would instead eliminate any solution but a. They would show 
that any number <a was too small, and any number >a was too large, to 
be the solution. We will study some examples of this style of proof below 
and see how it ultimately bore fruit in the foundations of mathematics. As 
a method of solving problems, however, it was sterile: how does one guess 
the number a in the first place? When mathematicians returned to problems 
of finding limits in the 17th century, they were in too much of a hurry for 
the rigorous methods of the Greeks. Their dubious, but efficient, methods 
of infinitesimals were criticized by the Zeno of the time, Bishop Berkeley, 
but little was done to meet his objections until much later. It was Dedekind, 
Weierstrass, and others in the 19th century who eventually restored Greek 
standards of rigor. 

The story of rigor lost and rigor regained took an amazing turn when 
a previously unknown manuscript of Archimedes, The Method, was dis- 
covered in 1906. In it he reveals that his deepest results were found using 
dubious infinitary arguments, and only later proved rigorously. Because, 
as he says, “It is of course easier to supply the proof when we have previ- 
ously acquired some knowledge of the questions by the method, than it is 
to find it without any previous knowledge.” 

The importance of this statement goes beyond its revelation that infin- 
ity can be used to discover results that are not initially accessible to logic. 
Archimedes was probably the first mathematician candid enough to explain 
that there is a difference between the way theorems are discovered and the 
way they are proved. 
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4.2 Eudoxus’s Theory of Proportions 


The theory of proportions is credited to Eudoxus (around 400-350 sce) and 
is expounded in Book V of Euclid’s Elements. Its purpose is to let lengths 
(and other geometric quantities) be treated as precisely as numbers, while 
admitting only the use of rational numbers. We saw the motivation for this 
in Section 1.5: the Greeks could not accept irrational numbers, but they 
accepted irrational geometric quantities such as the diagonal of the unit 
square. To simplify the exposition of the theory, let us call lengths rational 
if they are rational multiples of a fixed length. 

The idea of Eudoxus was to say that a length J is determined by those 
rational lengths less than it and those greater than it. To be precise, he says 
A, = A, if any rational length <A, is also <A, and vice versa. Likewise 
A, < A, if there is a rational length >A, but <A). This definition uses the 
rationals to give an infinitely sharp notion of length while avoiding any 
overt use of infinity. Of course the infinite set of rational lengths <J is 
present in spirit, but Eudoxus avoids mentioning it by speaking of an arbi- 
trary rational length <A. 

The theory of proportions was so successful that it delayed the develop- 
ment of a theory of real numbers for 2000 years. This was ironic, because 
the theory of proportions can be used to define irrational numbers just as 
well as lengths. It was understandable though, because the common irra- 
tional lengths, such as the diagonal of the unit square, arise from construc- 
tions that are intuitively clear and finite from the geometric point of view. 
Any arithmetic approach to V2, whether by sequences, decimals, or con- 
tinued fractions, is infinite and therefore less intuitive. Until the 19th cen- 
tury this seemed a good reason for considering geometry to be a better 
foundation for mathematics than arithmetic. Then the problems of geom- 
etry came to a head, and mathematicians began to fear geometric intu- 
ition as much as they had previously feared infinity. There was a purge of 
geometric reasoning from the textbooks and industrious reconstruction of 
mathematics on the basis of numbers and sets of numbers. Set theory is 
discussed further in Chapter 17. Suffice to say, for the moment, that set 
theory depends on the acceptance of completed infinities. 

The beauty of the theory of proportions was its adaptability to this new 
climate. Instead of rational lengths, take rational numbers. Instead of com- 
paring existing irrational lengths by means of rational lengths, construct 
irrational numbers from scratch using sets of rationals! The length V2 is 
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determined by the two sets of positive rationals 
Lagan <2), C= tr Sl) 


Dedekind (1872) decided in effect to let V2 be this pair of sets! In gen- 
eral, let any partition of the positive rationals into sets L, U such that any 
member of L is less than any member of U be a positive real number. This 
idea, now known as a Dedekind cut, is more than just a twist of Eudoxus; 
it gives a complete and uniform construction of all real numbers, or points 
on the line, using just the rationals. In short, it is an explanation of the 
continuous in terms of the discrete, finally resolving the fundamental con- 
flict in Greek mathematics. Dedekind was understandably pleased with his 
achievement. He wrote 


The statement is so frequently made that the differential calcu- 
lus deals with continuous magnitude, and yet an explanation 
of this continuity is nowhere given.... It then only remained 
to discover its true origin in the elements of arithmetic and 
thus at the same time secure a real definition of the essence of 
continuity. I succeeded Nov. 24 1858. 


Dedekind (1872), p.2 


EXERCISES 


There is only one Dedekind cut (LZ, U) corresponding to an irrational number 
a, but there are two cuts corresponding to a rational number a: 


L={r:r<aj, U={r:r>a} 

and 
L={r:r<aj, U={r:r2a}. 

To unify the theory of all reals we choose the latter cut, call it 
Ig ={rir<aj, Ug={r:r >a}, 


as the standard way to represent a rational a. We can then say, whether x is rational 
or irrational, that the lower set for x is 


L,={rir< x}. 
Now we use lower sets to define x+y and xy for positive reals x and y as follows: 


Ley ={rt+s:ir<x ands <y, where r,s are rational} 


Ly ={rs:r<xand s < y, where r, s are rational}. 
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4.2.1 Show that these are valid definitions of x + y and xy when x and y are 
rational. 


The test of these definitions, as Dedekind realized, is that they allow rigorous 
proofs of results like V2 V3 = V6 that (in Dedekind’s opinion) had never been 
rigorously proved before. Proofs using Dedekind’s definitions are possible, but 
not trivial. Even to prove that V2 V2 = 2 one has to prove the next two results. 


4.2.2 If r? <2 and s* < 2, show that rs <2. 


4.2.3 If a rational t < 2, show that tf = rs for some rationals r, s with r? < 2, 
gs? < 2. (Hint: Choose r with t < r* < 2.) 


4.2.4 Why do Exercises 4.2.2 and 4.2.3 show that V2 V2 = 2? 
4.2.5 Give a similar proof that V2 V3 = V6. 


4.3. The Method of Exhaustion 


The method of exhaustion, also credited to Eudoxus, is a generalization 
of his theory of proportions. Just as an irrational length is determined by 
the rational lengths on either side of it, more general unknown quantities 
become determined by arbitrarily close approximations using known fig- 
ures. Examples given by Eudoxus (and expounded in Book XII of Euclid’s 
Elements) are an approximation of the circle by inner and outer polygons 
(Figure 4.1) and an approximation of a tetrahedron by stacks of prisms 
(Figure 4.2, which shows the most obvious approximation, not the cun- 
ning one used by Euclid, which is shown in Figure 4.5). In both cases the 
approximating figures are known quantities, on the basis of the theory of 
proportions and the theorem that area of triangle = 1/2 base x height. 


Figure 4.1: Approximating a circle 
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Figure 4.2: Approximating a tetrahedron 


The polygonal approximations are used to show that the area of any 
circle is proportional to the square on its radius, as follows. Suppose P; C 
P, C P3 C --- are the inner polygons and Q; D Q2 D> Q3 D -:: are the 
outer polygons. Each polygon is obtained from its predecessor by bisect- 
ing the arcs between its vertices, as shown in Figure 4.1. It can then be 
shown, by elementary geometry, that the area difference Q; — P; can be 
made arbitrarily small, and hence P; approximates the area C of the circle 
arbitrarily closely. 

On the other hand, elementary geometry also shows that the area P; is 
proportional to the square, R’, of the radius. Writing the area as P;(R) and 
using the theory of proportions to handle ratios of areas, we have 


P,(R) : P(R’) = R? : R”. (1) 
Now let C(R) denote the area of the circle of radius R, and suppose 
C(R) : C(R’) < R2: RR”. (2) 
By choosing a P; that approximates C sufficiently closely we also get 
P(R) : P,(R’) < R?: R”, 


which contradicts (1). Hence the < sign in (2) is incorrect, and we can 
similarly show that > is incorrect. Thus the only possibility is 


C(R) : C(R’) = R?: R”, 


that is, the area of a circle is proportional to the square of its radius. That 
is, the constant of proportionality x in the formula nR? for the area of the 
circle is independent of the radius R. 
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Notice that “exhaustion” does not mean using an infinite sequence of 
steps to show that area is proportional to the square of the radius. Rather, 
one refutes any disproportionality in a finite number of steps (by going to 
a suitable P;). This is typical of the way in which exhaustion arguments 
avoid mention of limits and infinity. 

It is interesting that Euclid does not need the method of exhaustion in 
the theory of area for polygons. It can be done entirely by dissection argu- 
ments such as that showing area of triangle =1/2 basexheight (Figure 4.3). 
In fact, it was shown by Farkas Bolyai (1832a) that any polygons P, Q of 
equal area can be cut into polygonal pieces P;,...,P, and Q),...,Q, such 
that P; is congruent to Q;. Thus we can define polygons to be equal in area 
if they possess dissections into such correspondingly congruent pieces. 


Figure 4.3: Area of a triangle 


In Hilbert’s famous list of mathematical problems, Hilbert (1900a), the 
third was to decide whether an analogous definition was possible for poly- 
hedra. Dehn (1900) showed that it was not; in fact, a tetrahedron and a cube 
of equal volume cannot be dissected into corresponding congruent polyhe- 
dral pieces. Hence infinite processes of some kind, such as the method of 
exhaustion, are needed to define equality of volume. A readable account 
of Dehn’s theorem and related results may be found in Boltyansky (1978). 


EXERCISES 


Another approach to the volume of the tetrahedron by exhaustion is in Euclid 
(see Heath (1925), Book XII, Proposition 4). He dissected the tetrahedron into 
two smaller tetrahedra and two prisms as shown in Figure 4.4, with vertices at 
the edge midpoints of the original tetrahedron. (There is a “front” prism, with 
triangles left and right, and a “back” prism, with triangles top and bottom.) 


4.3.1 Show that the two prisms occupy more than half of the tetrahedron. (Hence, 
by iterating the construction in the smaller tetrahedra, the volume of the 
tetrahedron may be approximated arbitrarily closely by prisms.) 
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Figure 4.4: Euclid’s dissection of the tetrahedron 


4.3.2 Show that the volume of the two prisms in Figure 4.4 is 1/4 base x height 
(the base and height of the tetrahedron, that is). 


By computing the volumes of the corresponding prisms in the smaller tetra- 
hedra (Figure 4.5), and repeating, we find the volume of the original tetrahedron 


as a sum of a geometric series. 
-_> 


Figure 4.5: Repeated dissection of the tetrahedron 


4.3.3 Show that the total volume of the prisms is 


(; + 5 + 5 +) base x height = 1/3 base x height. 

In the next section we study a construction of Archimedes that is curiously 
similar to this one of Euclid. Each step cuts pieces out of the leftovers from 
the previous step and leads to a similar geometric series. While it is convenient 
for us to view the process as summing an infinite geometric series, both Euclid 
and Archimedes applied an exhaustion argument to finite (but arbitrarily long) 


geometric series. 
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4.4 The Area of a Parabolic Segment 


The method of exhaustion was brought to full maturity by Archimedes 
(287-212 sce). Among his most famous results are the volume and sur- 
face area of the sphere and the area of a parabolic segment. As mentioned 
in Section 4.1, Archimedes first discovered these results by nonrigorous 
methods, later confirming them by the method of exhaustion. Perhaps the 
most interesting and natural of his exhaustion proofs is the one for the area 
of the parabolic segment. The segment is exhausted by polygons similarly 
to Eudoxus’ exhaustion of the circle, but the area is obtained outright and 
not merely in proportion to another figure. 

To simplify the construction slightly we assume that the segment is 
cut off by a chord perpendicular to the axis of symmetry of the parabola. 
Archimedes divides the parabolic segment into triangles Aj, Ao, A3,..., as 
shown in Figure 4.6 (labeled by their subscripts). The middle vertex of 
each triangle lies on the parabola halfway between the other two (measured 
horizontally). These triangles clearly exhaust the parabolic segment, and 
so it remains to compute their area. Quite surprisingly, this turns into a 
geometric series. 


Figure 4.6: The parabolic segment 


We indicate how this comes about by looking at A3 (Figure 4.7). 

Since OP = 50X, PQ= iPS by definition of the parabola. On the 
other hand, SR = +PS , 80 OR = PS . Now Az is the sum of the triangles 
RQZ and OQR, which have the same base RQ and height OP = PX, hence 
equal area. We have just seen that RQZ has half the base of SRZ and it 
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Y S Z 


O P x 


Figure 4.7: A triangle in the segment 


has the same height; hence (calling figures equal when they have the same 
area) 


1 1 
A3 = SRZ = —OYZ = -A\. 
3 4 gt 


By symmetry, Az = A3, so Az + A3 = zl. 


A similar argument shows that 


1 
Apt Ds Det Ag se 


and so on, each new chain of triangles having one-fourth the area of the 
previous chain. Consequently, 


2 
1 1 
area of parabolic segment = A; [ + Z + (3) +-- | 


= AL 
Of course, Archimedes does not use the infinite series but uses exhaustion, 
showing that any area < 4A, can be exceeded by taking sufficiently many 
of the triangles A;. The sum of the finite geometric series needed for this 
was known from Euclid’s Elements, Book IX, where Euclid used it for the 
theorem about perfect numbers (Section 3.2). 
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EXERCISES 


Archimedes’ method of approximation by triangles was a brilliant success 
on the parabolic segment, but not suited to many other curves. A more generally 
useful method is approximation by rectangles, probably known to you from cal- 
culus. The area of a parabolic segment can also be computed in this way, though 
less gracefully, and indeed Archimedes did this too. We look at other curved areas 
that can be evaluated by rectangle approximation in Section 8.2. 

Probably the simplest area that cannot be found by this method is the area 
under the hyperbola y = 1/x, from x = 1 to x = t. This is because the area 
in question is logt, and the logarithm function cannot be defined by elementary 
means. But if instead one takes the area to be log t by definition, then it is possible 
to derive the basic property of the logarithm— 


log ab = loga + logb 


—and by means Archimedes would have understood. 


4.4.1 Suppose we approximate the area loga under y = 1/x from 1 to a by n 
rectangles of equal width, as shown in Figure 4.8. 


y=1/x 


Figure 4.8: Rectangle approximation to loga 


Show that the corresponding approximation to the area under y = 1/x from 
b to ab by n rectangles has exactly the same area. (In fact, corresponding 
rectangles have equal area.) 


4.4.2 Deduce from Exercise 4.4.1, by the method of exhaustion, that the area 
under y = 1/x from | to a equals the area under y = 1/x from b to ab. 


4.4.3 Deduce from Exercise 4.4.2, and the above definition of log, that 


log ab = loga + log b. 


® 


Check for 
updates 
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Polynomial Equations 


PREVIEW 


The first phase in the history of algebra was the search for solutions of 
polynomial equations. The “degree of difficulty” of an equation corre- 
sponds rather well to the degree of the corresponding polynomial. 

Linear equations are easily solved, and 2000 years ago the Chinese 
were even able to solve n linear equations in n unknowns by the method 
we now call “Gaussian elimination.” 

Quadratic equations are harder to solve, because they generally require 
the square root operation. But the solution—essentially the same as that 
taught in high schools today—was discovered independently in many cul- 
tures more than 1000 years ago. 

The first really hard case is the cubic equation, whose solution requires 
both square roots and cube roots. Its discovery by Italian mathematicians 
in the early 16th century was a decisive breakthrough, and equations quickly 
became the language of virtually all mathematics then known (See, for 
example, algebraic geometry in Chapter 6 and calculus in Chapter 8.) 

Despite this breakthrough, the problem of polynomial equations was 
far from solved. The obstacle is the qguintic equation—the general equation 
of degree 5. In the 1820s it finally became clear that the quintic equation 
is not solvable in the sense that equations of lower degree are solvable. 
But explaining why this is so requires a new, and more abstract, concept 
of algebra (see Chapter 14). 

A rather special, but important, thread in the history of algebra is the 
binomial theorem. Here we sketch its origins and how they led to early 
developments in combinatorics, probability, and number theory. 
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5.1 Algebra 


The word “algebra” comes from the Arabic word al-jabr meaning “restor- 
ing.” It passed into mathematics through the book Al-jabr w’al miiqabala 
(Science of restoring and opposition) of al-Khwarizmi in 830 ce, a work on 
the solution of equations. In this context, “restoring” meant adding equal 
terms to both sides and “opposition” meant setting the two sides equal. For 
centuries, al-jabr more commonly meant the resetting of broken bones, and 
the surgical meaning accompanied the mathematical one when “al-jabr” 
became “algebra” in Spanish, Italian, and English. Even today the surgi- 
cal meaning is included in the Oxford English Dictionary. Al-Khwarizmi’s 
own name has given us the word “algorithm,” so his work has had a lasting 
impact on mathematics, even though its content was quite elementary. 

His algebra went no further than the solution of quadratic equations, 
which had already been understood by the Babylonians, presented from the 
geometric viewpoint by Euclid, and reduced to a formula by Brahmagupta 
(628) (see Section 5.3). Brahmagupta’s work, the high point of Indian math- 
ematics to that time, was more advanced than al-Khwarizmt’s in several 
respects—notation, admission of negative numbers, and the treatment of 
Diophantine equations—even though it predated al-Khwarizmi and was 
very likely known to him. Indian mathematics had spread to the Muslim 
world with the general promotion of culture by the eighth-century caliphs 
of Baghdad, and Muslim mathematicians acknowledged the Indian origin 
of certain ideas, such as decimal numerals. Why then did al-Khwarizmi’s 
work rather than Brahmagupta’s become the definitive “algebra’’? 

Perhaps the time was ripe for the idea of algebra to be cultivated, and 
the simple algebra of al-Khwarizmi served this purpose better than those 
of his more sophisticated predecessors. In Indian mathematics, algebra was 
inseparable from number theory. In Greek mathematics, algebra was hidden 
by geometry. Other possible sources of algebra, Babylonia and China, were 
lost or cut off from the West until it was too late for them to be influential. 
The concept of algebra that emerged from al-Khwarizmi—the theory of 
polynomial equations—lasted for 1000 years. Only in the 19th century did 
algebra grow beyond these bounds, and this was a time when most fields 
of mathematics were outgrowing their established habitats. For a detailed 
history of algebra, which emphasizes the tradition of solving equations, see 
Katz and Parshall (2014). For the new developments from the 19th century 
onward see Gray (2018). 
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The early algebraic methods were essentially geometric methods, as we 
will see in the case of quadratic equations in Section 5.3. Algebraic methods 
for solving equations became distinct from, and superior to, the geometric 
only with new manipulative techniques and efficient notation in the 16th 
century (Section 5.5). Algebra did not break away from geometry, how- 
ever, but actually gave it a new lease on life, thanks to the development of 
algebraic geometry by Fermat and Descartes around 1630. This reunion of 
algebra and geometry at a higher level is discussed in Chapter 6. 

The story of algebraic geometry unfolds along with the story of polyno- 
mial equations, becoming entwined with many other mathematical threads 
in the process. One we have already seen is Diophantus’s chord and tangent 
method for finding rational solutions of equations (Section 3.5). Another 
relevant event, though not historically connected with Western mathemat- 
ics, was the method of elimination developed by Chinese mathematicians 
between the early Christian era and the Middle Ages. Since this method 
concerns equations of the lowest degree, it is logical to discuss it first. 


5.2 Linear Equations and Elimination 


The Chinese discovered a method for solving linear equations in any num- 
ber of unknowns during the Han dynasty (206 Bce—220 ce). It appears in the 
famous book Jiuzhang suanshu (Nine Chapters of Mathematical Art; see 
Shen et al. (1999)), which survives today in a third-century version with a 
commentary by Liu Hui. The method was essentially what we call Gaussian 
elimination, systematically eliminating terms in a system 


QA{{X1 + Qj2X2 $+ + + AyXy = by 


Ani X1 + AyaX. + +++ + AnnXpn = Dy 


by subtracting a suitable multiple of each equation from the one below it 
until a triangular system is obtained: 


a’ 1x1 + a’ 1X2 spe hee a inXn = b’, 


a’ 9X9 i bd A’ anXp = b’, 


/ / 
@ nnXn = Dy, 
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then solving for X),X%,-1,...,X, in turn by successive substitutions. Such 
calculations were particularly well suited to a Chinese device called the 
counting board, which held the array of coefficients and allowed “row 
operations” like those we perform with matrices. For further details, see 
Li and Du (1987) or Martzloff (2006). 

Around the 12th century, Chinese mathematicians found that elimi- 
nation could be adapted to simultaneous polynomial equations in two or 
more variables. For example, one can eliminate y between the pair of equa- 
tions 


ag(x)y” + ay (xy | + +++ + an(x) = 0, (1) 
bo(x)y” + by (x)y™ | +--+ + by(x) = 0, (2) 


where the a;(x), b;(x) are polynomials in x. The y’” term can be eliminated 
by forming the equation bo(x) x (1) — ao(x) X (2), say, 


m—1 


co(x)y” + ci(x)y" +++ Cp-1(x) = 0. (3) 


We can form a second equation of degree m — 1 in y by multiplying (3) by 
y, then again eliminating y” between (3) x y and (1), giving, say, 


do(x)y"! + dy(xyy"? +--+ + dn-i(x) = 0. (4) 


The problem is now reduced to eliminating y between the equations (3) 
and (4), which are of lower degree in y than (1) and (2). Thus one can 
continue inductively until an equation in x alone is obtained. This method 
was extended to four variables in the work of Zhi Shijié (1303) entitled 
Siyuan yujian (Jade Mirror of Four Unknowns). 

As we will see in Chapter 6, the two-variable polynomial problem 
arose in the West in the 17th century, in the context of finding intersec- 
tions of curves. This led first to a rediscovery of the method of elimination 
for polynomials; only later was this method based on an understanding 
of linear equations (and determinants, see Chapter 16). The well-known 
Cramer’s rule for solving linear equations using determinants was named 
after its appearance in a book on algebraic curves, Cramer (1750). 


EXERCISES 


The first interesting case of elimination between two-variable polynomials 
occurs when the polynomials have degree 2. Geometrically, this amounts to find- 
ing the intersections of two conic sections. 
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5.2.1 Derive an equation that is linear in y from the two equations 
x +xyty =1, 
Ax? + 3xy + 2y = 3, 
and hence show that y = (1 — 2x)/x. 


5.2.2 Deduce that the intersections of the two curves in Exercise 5.2.1 occur 
where x satisfies 3x4 — 4x? + 1 = 0. 


This example, where the two equations of degree 2 yield a single equation of 
degree 4 (= 2X2), illustrates a general phenomenon where degrees are multiplied. 
We will observe other instances, and study it more deeply, as the book progresses. 

The present example is not a typical equation of degree 4, since it is quadratic 
in x2 = z. However, this makes it a lot easier to solve. 


5.2.3 Solve 32° - 4¢ + 1 = 0 for z = x” by factorizing the left-hand side, and 
hence find four solutions for x. 
Give geometric reasons why you would expect two curves of degree 2 to 
have up to four intersections. Could they have more than four? 


The Jade Mirror of Four Unknowns does not go beyond four equations in 
four unknowns (hence the name). The idea is quite general, but it becomes hard 
to implement on the counting board when there are more than four unknowns. An 
amusing problem in three unknowns from the Jade Mirror, which does not require 
the full strength of the elimination method, is given in the exercises below. 


5.2.4 Problem 2 in the Jade Mirror (see Hoe (1977), p. 135) is to find the side a 
of a right-angled triangle (a, b, c) such that 
a —(b+c-—a) =ab, 
b? +(a+c—b) =be. 
The Jade Mirror suggests choosing the unknowns x = a and y = b+c. 
Using a? = c? — b?, show that this implies 
b=(y-x'/y)/2, 
c= (yt x°/y)/2. 
5.2.5 Deduce that the first two equations in Exercise 5.2.4 are equivalent, respec- 
tively, to 
(-2 — xy? + (2x + 22x*)y + 2° =0, 
(2—x)y* + 2xy +x =0. 
5.2.6 By subtracting one equation in Exercise 5.2.5 from the other, deduce that 


y = x’/2. Substitute this back to obtain a quadratic equation for x, with 
solution x = a = 4. What are the values of b and c? 
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5.3. Quadratic Equations 


As early as 2000 sce, the Babylonians could solve a pair of simultaneous 
equations of the form 


X+Y=pP, 
xy = 4; 
which are equivalent to the quadratic equation 
x +@ = pK. 


The original pair was solved by a method that gave the two roots of the 
quadratic, 


P by 
ese —_ -q, 
7? ¢ 
when both were positive (the Babylonians did not admit negative num- 


bers). The steps in the method were as follows: 


: x+y 
(i) Form =. 


Gi) Form (= 


y 
(iii) Form (4) — xy. 


(iv) Form qo) — xy = 5. 


(v) Find x, y by inspection of the values in (i), (iv). 


(See Boyer (1968), p.34, for an actual example.) Of course, these steps 
were not expressed in symbols but only applied to specific numbers. Nev- 
ertheless, a general method is implicit in the many specific cases solved. 

An explicit general method, expressed as a formula in words, was 
given by Brahmagupta (628): 


To the absolute number multiplied by four times the [coeffi- 
cient of the] square, add the square of the [coefficient of the] 
middle term; the square root of the same, less the [coefficient 
of the] middle term, being divided by twice the [coefficient of 
the] square is the value. 


Colebrooke (1817), p. 346 
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This is the solution 


V4ac + b? —b 


ee 2a 


of the equation 

ax’ + bx = C, 
yet one wonders whether Brahmagupta understood it quite this way when, 
a few lines later, he gives another rule that is trivially equivalent to the first 
when expressed in our notation: 


a vac + (b/2)? — (b/2) 
nc rr 


The methods of the Babylonians and Brahmagupta clearly give correct 
solutions, but their basis is not clear. The meaning of square roots, for exam- 
ple, was not questioned as it was by Greeks. A rigorous basis for the solu- 
tion of quadratic equations can be found in Euclid’s Elements, Book VI. 
His Proposition 28 can be viewed as a solution of the general quadratic 
equation in the case where there is a positive root, as Heath (1925), Vol. 2, 
p. 263 explains. However, the algebraic interpretation is far from obvious 
even when one specializes the proposition, which is about parallelograms, 
to one about rectangles. It seems unlikely that Euclid was aware of the alge- 
bra, or he would have expressed it by much simpler geometry. 

The transition from geometry to algebra can be seen in al-Khwarizmi’s 
solution of a quadratic equation (Figure 5.1). The solution is still expressed 
in geometric language, but now the geometry is a direct embodiment of the 
algebra. It is really the standard algebraic solution, but with “squares” and 
“products” understood literally as geometric squares and rectangles. To 
solve x? + 10x = 39, represent x” by a square of side x, and 10x by two 
5 xX x rectangles as in Figure 5.1. The extra square of area 25 “completes 
the square” of side x + 5 to one of area 25 + 39, since 39 is the given value 
of x? + 10x. Thus the big square has area 64, hence its side x + 5 equals 8. 
This gives the solution x = 3. 

Euclid and al-Khwarizmi did not admit negative lengths, so the solu- 
tion x = —13 to x? + 10x = 39 does not appear. This is quite natural, since 
geometry admits only one square with area 64. Avoiding negative coeffi- 
cients, however, causes algebraic complications. There is not one general 
quadratic equation, but three, corresponding to the different ways of dis- 
tributing positive terms between the two sides: x* + ax = b, x” = ax +b, 
x* +b=ax. 
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5x 5 


Figure 5.1: Solving a quadratic equation 


EXERCISES 


Quadratic equations arise frequently in geometry because distance is gov- 
erned by a quadratic equation (ultimately, by the Pythagorean theorem). In fact, 
the points created from rational points by any ruler and compass construction can 
be found by solving a series of linear or quadratic equations, which is why they 
can be expressed by rational operations and square roots. This result, which was 
claimed in Section 2.3, can be proved as follows. 


5.3.1 Show that the line through two rational points has an equation with rational 
coefficients. 


5.3.2 Show that a circle whose center is a rational point and whose radius is 
rational has an equation with rational coefficients. 


Your proof should show, more generally, that a line or circle constructed from 
any points has an equation with coefficients obtainable from the coordinates of 
the given points by rational operations. It then suffices to show that intersections 
of lines and circles can be obtained from the coefficients of their equations by 
rational operations and square roots. 


5.3.3 Show that the intersection of two lines can be computed by rational opera- 
tions. 


5.3.4 Show that the intersection of a line and a circle can be computed by rational 
operations and a square root (because it depends on solving a quadratic 
equation). 


The last, and hardest, case is finding the intersection of two circles. Fortu- 
nately, it is easy to reduce these two quadratic equations to the case just handled 
in Exercise 5.3.4. 


5.3. Quadratic Equations 71 


5.3.5 The equations of any two circles can be written in the form 


(x-aP +(y-by =P, 


(x-c +(y-dy =". 


Explain why. Now subtract one of these equations from the other, and 
hence show that their common solutions can be found by rational oper- 
ations and square roots. 


When a sequence of quadratic equations is solved, the solution may involve 


nested square roots, such as 4/(5 + ¥5)/2. This very number, in fact, occurs in 


the icosahedron, as one sees from Pacioli’s construction in Section 2.2. 


5.3.6 Show that the diagonal of a golden rectangle (which is also the diameter of 


an icosahedron of edge length 1) is «/(5 + ¥5)/2. 


5.4 Quadratic Irrationals 


The roots of quadratic equations with rational coefficients are numbers 
of the form a + vb, where a,b are rational. Euclid took the theory of 
irrationals further in Book X of the Elements with a very detailed study 


of numbers of the form V Va+ vb, where a, b are rational. Book X is 
the longest book in the Elements and it is not clear why Euclid devoted 
so much space to this topic: perhaps because some of it is needed for the 
study of regular polyhedra in Book XIII (see Section 2.2 and Exercise 
5.3.6), perhaps simply because it was Euclid’s favorite topic, or perhaps it 
was one in which he had some original contributions to show off. It is said 
that Apollonius took the theory of irrationals further, but unfortunately his 
work on the subject is lost. 

After this, there seems to have been no progress in the theory of irra- 
tionals until the Renaissance, except for a remarkable isolated result by 
Fibonacci (1225). Fibonacci showed that the roots of x? + 2x? + 10x = 20 
are not any of Euclid’s irrationals. This is not a proof, as some historians 
have thought, that the roots cannot be constructed by ruler and compass. 
Fibonacci did not rule out all expressions built from rationals and square 
roots; nevertheless, it was the first step into the world of irrationals beyond 
Euclid. 

At this point it is worth asking how difficult it is to show that a spe- 
cific number, say, v2, cannot be constructed from rational numbers by 


72 5 Polynomial Equations 


square roots. The answer will depend on how well the reader manages 
the following exercises. The manipulation required would certainly not 
have been beyond the 16th-century algebraists. The subtle part is finding a 
suitable classification of expressions according to complexity—extending 
Euclid’s classification to expressions in which radical signs are nested to 
arbitrary depth—and using induction on the level of complexity. This type 
of thinking did not emerge until the 1820s, hence the relatively late proof 
that V2 is not constructible by ruler and compass, by Wantzel (1837). A 
few decades later, the proof became a routine part of the theory of fields 
and vector spaces, as we will see in Chapter 16. 


EXERCISES 


An elementary proof that V2 is not constructible was found by the number 
theorist Edmund Landau (1877-1938) when he was still a student. It is broken 
down to easy steps below. But first we should check that V2 is actually irrational. 


5.4.1 Show that the assumption V2 = m/n, where m and n are integers, leads to 
a contradiction. 


Landau’s proof now organizes all numbers involved in a construction into sets 
Fo, F, F2,..., according to the depth of nesting of square roots. 


5.4.2 Let 
Fo = {rationals}, Fux, = {at bey: a,b,c, € Fy} for some cy € Fy. 


Show that each F; is a field, that is, if x, y are in F,, then so are x+y, x—y, 
xy, and x/y (for y # 0). 


We know from Exercise 5.4.1 that V2 is not in F, 9, but if it is constructible it 
will occur in some F;,,;. A contradiction now ensues by considering (hypotheti- 
cally) the first such Fy41. 


5.4.3 Show that if a,b,c € F, but Yc ¢ Fy, thena+by¥c=0 © a=b=0. 
(For k = 0 this is in the Elements, Book X, Prop. 79.) 


5.4.4 Suppose V2 =atb vc, where a,b,c € F;, but that V2 ¢ F,,. (We know that 
V2 ¢ F, 9 from Exercise 5.4.1.) Cube both sides and deduce from Exercise 
5.4.3 that 


2=a+3ab’c and 0=3ab+b'c. 


5.4.5 Deduce from Exercise 5.4.4 that V2 = a—-b vc also, and explain why this 
is a contradiction. 


5.5. The Solution of the Cubic 73 


5.5 The Solution of the Cubic 


In our own days Scipione del Ferro of Bologna has solved 
the case of the cube and first power equal to a constant, a 
very elegant and admirable accomplishment. Since this art 
surpasses all human subtlety and the perspicuity of mortal 
talent and is a truly celestial gift and a very clear test of the 
capacity of men’s minds, whoever applies himself to it will 
believe that there is nothing that he cannot understand. In 
emulation of him, my friend Niccold Tartaglia of Brescia, 
wanting not to be outdone, solved the same case when he got 
into a contest with his [Scipione’s] pupil, Antonio Maria Fior, 
and, moved by my many entreaties, gave it to me ... having 
received Tartaglia’s solution and seeking a proof of it, I came 
to understand that there were a great many other things that 
could also be had. Pursuing this thought and with increased 
confidence, I discovered these others, partly by myself and 
partly through Lodovico Ferrari, formerly my pupil. 


Cardano (1545), p. 8 


The solution of cubic equations in the early 16th century was the first 
clear advance in mathematics since the time of the Greeks. It revealed the 
power of algebra that the Greeks had not been able to harness, power that 
was soon to clear a new path to geometry, which was virtually a royal 
road (algebraic geometry and calculus). Cardano’s elation at the discovery 
was well-founded. Even in the 20th century, personally discovering the 
solution of the cubic equation has been the inspiration for at least one 
distinguished mathematical career—see Kac (1984). 

As for the history of the original discovery, we know little more than 
Cardano tells us. Scipione del Ferro died in 1526, so the first solution was 
known before then. Tartaglia discovered his solution on February 12, 1535, 
probably independently, because he solved all problems in the contest with 
del Ferro’s pupil Fior, while Fior did not. Cardano has been accused by 
almost everyone, from Tartaglia on, of stealing Tartaglia’s solution, but his 
own account seems to distribute credit quite fairly. For more background, 
see the introduction and preface to Cardano (1545) and Crossley (1987). 

Cardano presents algebra in the geometric style of al-Khwarizmi (whom 
he describes as the originator of algebra at the beginning of the book), with 
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the case distinctions caused by avoiding negative coefficients. Ignoring 
these complications, his solution can be described as follows. The cubic 
equation x° +ax?+bx+c = Ois first transformed into one with no quadratic 
term by a linear change of variable, x = y — a/3. One then has, say, 
3 
y =pytd. 
By setting y = u + v, the left-hand side becomes 
(uw? + v’) + 3uv(u + v) = 3uvy + (uw + v°), 
which equals the previous right-hand side if 
3uv = p, 
uw+v =4¢. 


Eliminating v gives a quadratic in uv’, 


(By me 
= 7-0) 


By symmetry, we obtain the same values for v*. And since uv? + v? = q, if 
one of the roots is taken to be 3, the other is v°. Without loss of generality 


with roots 


we can take 
2 3 
Ha) (5) 
2 2 3 
“$-F-Q) 
2 2 3)? 
and hence 


EXERCISES 


The two equations 3uv = p, u> + v’ = q provide another instance of the 
phenomenon noted in Exercise 5.2.2: when a variable is eliminated between two 
equations, the degrees of the equations are multiplied. 
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5.5.1 The equation 3wv = p is of degree 2 in u and v, and u* + v* = q is of degree 
3. What about the equation obtained by eliminating v? 


The Cardano formula produces some surprising results, which we look at 
again in Section 11.2. But first let us test it on a really simple cubic equation. 


5.5.2. Use Cardano’s formula to solve y* = 2. Do you get the obvious solution? 
Now try one where the solution is less obvious. 


5.5.3 Use Cardano’s formula to solve y> = 6y + 6, and check your answer by 
substitution. 


5.6 Angle Division 


Another important contributor to algebra in the 16th century was Viéte 
(1540-1603). He helped emancipate algebra from the geometric style of 
proof by introducing letters for unknowns and using plus and minus signs 
to facilitate manipulation. Yet at the same time he strengthened its ties 
with geometry at a higher level by relating algebra to trigonometry. A case 
in point is his solution of the cubic by circular functions (Viéte (1591), 
Ch. VI, Theorem 3), which shows that solving the cubic is equivalent to 
trisecting an arbitrary angle. 
Namely, if we take the cubic in the form 


x taxtb= 0, 
we can reduce it to an equation 


4y> —3y =c 


with just one parameter, by setting x = ky and choosing k so that 


ee =4 : [-4a 
ak 3” 3 
The point of the expression 4y? — 3y is that 
4cos* 6 — 3.cos @ = cos 36; 


so by setting y = cos 6 we obtain 


cos 38 =c. 
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If we are given c, then we can construct a triangle with angle cos"! c = 36. 
Trisection of this angle gives us the solution y = cos@ of the equation. 
Conversely, the problem of trisecting an angle with cosine c is equivalent 
to solving the cubic equation 4y* — 3y = c. 

Of course, there is a problem with trigonometric interpretation when 
lc| > 1, which requires complex numbers for its resolution. Complex num- 
bers are also involved in Cardano’s formula, since the expression under the 
square root sign, (q/2)* — (p/3)°, can be negative. In fact, Viéte’s method 
requires complex numbers only when Cardano’s does not, so between the 
two of them, complex numbers are avoided. Nevertheless, cubic equations 
are the birthplace of complex numbers, as we will see in Chapter 11. 

Astonishingly, the problem of dividing an angle into any odd number 
of equal parts has an algebraic solution analogous to the algebraic solu- 
tion of the cubic. Viéte (1579) himself took the problem as far as finding 
expressions for cos n@ and sinné@ as polynomials in cos @ and sin 6, at least 
for certain values of n. Newton read Viéte in 1663-4 and found the equa- 
tion 
n(n? —1) 4 n(n? — 1)(n? — 37) 

al 5] 
relating y = sinn@ and x = sin @ (see Newton (1676a) in Turnbull (1960)). 
He asserted this result for arbitrary n, but we are interested in the case 
of odd integral n, when it reduces to a polynomial equation of degree n. 


The surprise is that Newton’s equation then has a solution by nth roots 
analogous to the Cardano formula for cubics, 


ln ln 
== /yt+ Vy2-l+=afy- Vy’-1, 1 
= y av y (1) 


although only for of the form 4m + 1. This formula appears out of the 
blue in de Moivre (1707).! He does not explain how he found it, but it is 
comprehensible to us as 


1 n . 1 nh . 
sin 6 = 5 sinndé +icosnd + 5 Vsin né — icosné, (2) 


fee. 


y =nx—- 


a consequence of our version of de Moivre’s formula 


(cos 6 + isin 6)" = cosné + isinné (3) 


‘Tt also appears in the unpublished Leibniz (1675), though without the restriction on n. 
See Schneider (1968), pp. 224-228. 
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when n = 4m + 1. (See Exercises 5.6.2 and 5.6.3.) 

Viéte himself came remarkably close to (3) in a posthumously pub- 
lished work, Viéte (1615). He observed that the products of sin 6, cos 4 
that occur in cosn@, sinn@ are the alternate terms in the expansion of 
(cos @ + sin@)”, except for certain minus signs. He failed only to notice 
that the signs could be explained by giving sin @ the coefficient i. In any 
case, such an explanation would not have seemed natural to his contempo- 
raries, who were far more comfortable with Cardano’s formula than they 
were with i. In Section 12.1 we will see how de Moivre’s formula evolved 
with the development of complex numbers. 


EXERCISES 


A good use of de Moivre’s formula is to prove the formula for cos 34 involved 
in Viéte’s solution of the cubic. 


5.6.1 Use (cos@ + isin@)? = cos 36 + isin 36 to find a formula for cos 36. 


The reasons why (1) and (2) hold only for certain integer values, while (3) holds 
for all, can be understood by actually working out (sin @ + icos 6)”. 


5.6.2 Use (3) and sina = cos(z/2 — @), cos @ = sin(z/2 — @) to show that 


sinnO+icosn@ whenn=4m+ 1 


: . ie 
(smn cos G] -{ —sinn@—icosn@ whenn = 4m+3. 


5.6.3. Deduce from Exercise 5.6.2 that (2) is correct for n = 4m + | and false for 
n = 4m + 3, and hence that (1) is a correct relation between y = sinn@ and 
x = sin@ only when n = 4m + 1. 


5.6.4 Show that (1) is a correct relation between y = cosn@ and x = cos @ for all 
n (de Moivre (1730)). 


5.7 Higher-Degree Equations 


The general fourth-degree, or quartic, equation 
x ba + by + ee ed =0 


was solved by Cardano’s student Ferrari, and the solution was published 
in Cardano (1545), p. 237. A linear transformation reduces the equation to 
the form 


x’ + px +qx+r=0, 
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or 
(x? + pp)? = px? -—gxtp*—r. 
Then for any y, 
(P+ pty)! = (pr -qxtp?-N+2y+p)+y¥ 
= (p + 2y)x° — gx + (p? — r+ 2py ty’). 


The quadratic Ax* + Bx + C on the right-hand side will be a square if 
B* — 4AC = 0, which is a cubic equation for y. We can therefore solve for 
y and take the square root of both sides of the equation for x, which then 
becomes quadratic and hence also solvable. The final result is a formula for 
X using just square and cube roots of rational functions of the coefficients. 
This impressive bonus to the solution of cubic equations raised hopes 
that higher-degree equations could also be solved by formulas built from 
the coefficients by rational operations and roots, and solution by radicals, 
as it was called, became a major goal of algebra for the next 250 years. 
However, all such efforts to solve the general equation of fifth degree 
(quintic) failed. The most that could be done was to reduce it to the form 


x —x-A=0 


with only one parameter. This was done by Bring (1786), and a sketch of 
his method may be seen in Pierpont (1895). Bring’s result appeared in a 
very obscure publication and went unnoticed for 50 years, or it might have 
rekindled hopes for the solution of the quintic by radicals. As it happened, 
Ruffini (1799) offered the first proof that this is impossible. Ruffini’s proof 
was not completely convincing, but he was vindicated when a satisfac- 
tory proof was given by Abel (1826), and again with the beautiful general 
theory of equations of Galois (183 1b). 

A positive outcome of Bring’s result was the analytic solution of the 
quintic by Hermite (1858). Reduction to an equation with one parame- 
ter opened the way to a solution by transcendental functions, like Viéte’s 
solution of the cubic by circular functions. Suitable functions, the ellip- 
tic modular functions, had been discovered by Gauss, Abel, and Jacobi, 
and Galois (1831a) had hinted at their relation to quintic equations. This 
extraordinary confluence of ideas was the subject of Klein (1884). 

In view of the difficulties with the quintic, there was naturally very 
little progress with the general equation of degree n. However, two simple 
but important contributions were made by Descartes (1637). The first was 
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3. x4 x°, and so on. 


the superscript notation for powers we now use: x 
(Though not x”, oddly enough. The square of x continued to be written xx 
until well into the next century.) The second was the theorem of Descartes 
(1637), p. 159, that a polynomial p(x) with value 0 when x = a has a 
factor (x — a). Since division of a polynomial p(x) of degree n by (x — a) 
leaves a polynomial of degree n— 1, Descartes’s theorem raised the hope of 
factorizing each nth-degree polynomial into n linear factors. As Chapter 11 


shows, this hope was fulfilled with the development of complex numbers. 
EXERCISES 


The main steps in the proof of Descartes’s theorem go as follows. If the first 
step does not seem sufficiently easy, begin with a = 1. 


5.7.1 Show that x” — a” has a factor x — a. What is the quotient (x" — a”)/(x -— a)? 


(And what does this have to do with geometric series?) 
5.7.2 If p(x) = xk + apy x! + +++ + ax + ao, use Exercise 5.7.1 to show that 
p(x) — p(a) has a factor x — a. 


5.7.3 Deduce Descartes’s theorem from Exercise 5.7.2. 


5.8 The Binomial Theorem 


Some important results in algebra/number theory were discovered in the 
Middle Ages, though they failed to take root until they were rediscovered 
in the 17th century or later. Among these were the discovery of 
“Pascal’s triangle” by Chinese mathematicians, and formulas for permu- 
tations and combinations by Levi ben Gershon (1321). Pascal’s triangle 
began to flourish in the 17th century after a long dormancy, so it is of inter- 
est to see what was known of it in medieval times and what Pascal did to 
revive it. 

The Chinese used Pascal’s triangle to generate and tabulate the bino- 
mial coefficients, that is, the coefficients in the formulas 


(a+b)! = atb 

(a+by = a’ +2ab + b* 

(a+by = a + 3a*b + 3ab? + b? 

(a+b) = a’ + 4a*b + 6a7b? + 4ab> + bt 

(a+byp = a> + 5a*b + 10a°b? + 10a*b? + Sab* + b° 
(a+b)° = a® + 6a°b + 15a*b? + 20a*b? + 15ab* + 6ab> + b® 


(a+b)! = a’ +7a°b + 21a°b* + 35a*b? + 35a3b* + 21a*b? + Tab® +b! 
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and so on. When the binomial coefficients are tabulated as follows (with a 
trivial row 1 added at the top, corresponding to the power 0 of a+ b), 


13 3 1 
1464 1 
1 5 10 10 5 1 
1 6 15 20 15 6 1 
1 7 21 35 35 21 7 «#1 

and so on, the kth element (2) of the nth row is the sum ( ') + () of the 
two elements above it in the (nm — 1)th row, as follows from the formula 
(Exercise 5.8.1) 


(a+ by" =(a+b)"'at+(a+b)y"'b. 


The triangle appears to a depth of six in Yang Hui (1261) and to a depth 
of eight in Zhi Shijié (1303) (Figure 5.2). Yang Hui attributes the triangle 
to Jia Xian, who lived in the 11th century. 

The number (2) appears in medieval Hebrew writings as the number of 
combinations of n things taken k at a time. Levi ben Gershon (1321) gives 


the formula 
n\ _ n! 
k} (n—k)!tk! 


together with the fact that there are n! permutations of n elements. 

In view of these excellent results, why do we call the table of bino- 
mial coefficients Pascal’s triangle? It is of course not the only instance 
of a mathematical concept being named after a rediscoverer rather than a 
discoverer, but in any case Pascal deserves credit for more than just redis- 
covery. In his Traité du triangle arithmétique, Pascal (1654) united the 
algebraic and combinatorial theories by showing that the elements of the 
arithmetic triangle could be interpreted in two ways: as the coefficients of 
a’-*p* in (a + b)" and as the number of combinations of n things taken 
k at a time. As an application, he founded the mathematical theory of 
probability by solving the problem of division of stakes (Exercise 5.8.2), 
and as a method of proof he consciously used mathematical induction (in 
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Figure 5.2: Chinese Pascal’s triangle 


the “base step, induction step” format) for the first time. Altogether, quite 
some progress! 


EXERCISES 


The basic properties of the binomial coefficients, for example the fact that 
each is the sum of the two above it in Pascal’s triangle, follow easily from their 
interpretation as the coefficients in the expansion of (a + b)”. 


5.8.1 Use the identity 
(a+ by" =(a+b)"'at+(a+b)"'b 


to prove the sum property of binomial coefficients: 


()=(ira)+("s) 
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This property gives an easy way to calculate Pascal’s triangle to any depth, 
and hence compute a fair division of stakes in a game that has to be called off 
with n plays remaining. We suppose that players I and II have an equal chance of 
winning each play, and that I needs to win k of the remaining n plays to carry off 
the stakes. 


5.8.2 Show that the ratio of I’s winning the stakes to that of II’s winning is 


n n n n n n 
+ treet : + terest]. 
lana} ele a) lata) +f 
The sum property of the binomial coefficients also explains the presence of 
some interesting numbers in Pascal’s triangle. 


5.8.3 Explain why the third diagonal from the left in the triangle, namely 1, 3, 6, 
10, 15, 21, ..., consists of the triangular numbers. 


5.8.4 The numbers on the next diagonal, namely 1, 4, 10, 20, 35..., can be called 
tetrahedral numbers. Why is this an apt description? 


5.9 Fermat’s Little Theorem 


The algebra of binomial coefficients also led to a famous theorem of num- 
ber theory due to Fermat (1640). It is known as his “little” or “lesser” 
theorem to distinguish it from his “last” or “great” theorem (Section 10.1). 
Fermat’s little theorem is the following. 

If p is prime and gcd(n, p) = 1, then n?-! — 1 is divisible by p or, 
equivalently, n? — n is divisible by p. 

The equivalence holds because n? — n = n(n?! — 1) is divisible by p 
if and only if n?~! — 1 is, since p is prime and does not divide n. 

Fermat’s little theorem has recently become indispensable in areas of 
applied mathematics, such as cryptography, so it is thought-provoking to 
learn that it originated in one of the least applied problems in mathemat- 
ics, the construction of perfect numbers. As we saw in Section 3.2, this 
depends on the construction of prime numbers of the form 2” — 1, and it 
was initially for this reason that Fermat became interested in conditions for 
2” — 1 to have divisors. At the same time (mid- 1630s) he was investigating 
the binomial coefficients, and the combination of these two interests very 
likely led to the discovery of his little theorem, for n = 2. 

His actual proof is unknown, but various authors (for example, Weil 
(1984), p. 56) have pointed out that the theorem follows immediately from 
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the fact that (7 ), (°), ee (2 i for p prime, are divisible by p: 


wr =asiye=r+(t)+(2) +--+ - Jan, 
1 2 p-l 


+a-ff fob 2 


is divisible by p, and therefore so is 2?~! — 1. 
But how does one prove that (7), (3 ), ware Pz :) are divisible by p? This 


hence 


2 
follows easily from the Levi ben Gershon formula 


\ p! 
k} (p—-BIKY 


which shows that the prime p is a factor of the numerator but not of the 
denominator. The denominator nevertheless divides the numerator, since 
(?) is an integer, so (by unique prime factorization) the factor p must 
remain after the division has taken place. Fermat may not have had pre- 
cisely this result, since he did not yet have Pascal’s combinatorial interpre- 
tation of the binomial coefficients, but he did have the formula 


ee a 

n =m F 

m-—1 m 

which implies it and from which the divisibility property may be extracted 
(see Weil (1984), p. 47). 

Thus far we have a proof of Fermat’s little theorem for n = 2. Weil 
(1984) suggests two possible routes to the general theorem from this point. 
One is by iteration of the binomial theorem, a method that was used in the 
first published proof of Fermat’s theorem by Euler (1736). The other is by 
direct application of the multinomial theorem, the method of the earliest 
known proof, which is in an unpublished paper of Leibniz from the late 
1670s (see Weil (1984), p. 56). 

Just as 


coefficient of a? *b* in (a + b)? = p!/(p— WK}, 


dn 


coefficient of ala’ --- ai" in (a; + az + +++ +a,)? = p!/qi!qo! +++ aGn!, 
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where gi +q2+:::+Gn = p (Exercise 5.9.4). This multinomial coefficient is 
divisible by p, by the same argument as before, provided no g; = p. Thus 
the coefficients of all but a/,a},...,an in (a, +a) +--+ +a,)? are divisible 
by the prime p. It follows, by replacing each of the n terms a ,d2,...,dy 
by 1, that 


(1+14---+1)? = 12+ 1? 4+---+1? + terms divisible by p, 


that is, n? — n is divisible by p. Then if n itself is relatively prime to p 
(hence not divisible by p), we have n?~! — 1 divisible by p, or the general 
Fermat little theorem. 


EXERCISES 
The binomial theorem may be iterated to show that p divides n? —n as follows. 
5.9.1 Use the result 2? = (1 + 1)? = 2 + terms divisible byp, and its method of 
proof, to show that 
3? = (2+ 1)? = 3 + terms divisible byp. 
5.9.2 Build on the idea of Exercise 5.9.1 to show that n? — n is divisible by p for 
any positive integer n. 


5.9.3 Observe the terms divisible by p in the first few rows of Pascal’s triangle, 
computed in the previous section. 


Like the binomial theorem, the multinomial theorem can be proved combinato- 
rially by considering the number of ways a term a‘ ay --- ai" can arise from the 


factors of (a) + dz +-+-+dn)?. 


5.9.4 Prove the formula for the multinomial coefficient given above by observing 
that the coefficient equals the number of ways of partitioning p things into 
disjoint subsets of sizes q1,g2,.--59n- 


™® 
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Algebraic Geometry 


PREVIEW 


The first field of mathematics to benefit from the new language of equa- 
tions was geometry. Around 1630, both Fermat and Descartes realized that 
geometric problems could be translated into algebra by means of coordi- 
nates, and that many problems could then be routinely solved by algebraic 
manipulation. 

The language of equations also provides a simple but natural classifi- 
cation of curves by degree. The curves of degree 1 are the straight lines; 
the curves of degree 2 are the conic sections; so the first “new” curves are 
those of degree 3, the cubic curves. 

Cubic curves exhibit new geometric features—cusps, inflections, and 
self-intersections—so they are considerably more complicated than the 
conic sections. Nevertheless, Newton attempted to classify them, and in 
doing so he discovered that cubic curves, when properly viewed, are not 
as complicated as they seem. 

We will find our way to the “right” viewpoint in Chapters 7 and 11. 
In the meantime we discuss another theorem that depends on the “right” 
viewpoint: Bézout’s theorem, according to which a curve of degree m 
always meets a curve of degree n in mn points. 
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6.1 Steps Toward Algebraic Geometry 


The basic idea of algebraic geometry is the representation of curves by 
equations, but this is not the whole idea. If it were, then the Greeks would 
be considered the first algebraic geometers. Menaechmus was perhaps the 
first to discover (what we would call) equations of curves, along with 
his discovery of the conic sections. We have seen how equations explain 
how he obtained V2 as the intersection of a parabola and a hyperbola 
(Section 2.4). Apollonius’ study of conics involved equations, but they 
were atrived at by geometric arguments. 

What was lacking in Greek mathematics was both the inclination and 
the technique to manipulate equations to obtain information about curves. 
The Greeks used curves to study algebra rather than the other way around. 
Menaechmus’s construction of V2 is a fine example of this: extraction of 
roots was not a given algebraic operation but one achieved by geometric 
construction. Similarly, an equation was not an entity in its own right but a 
property of a curve that could be discovered after the curve had been con- 
structed geometrically. This was natural as long as equations were written 
in words. When, as in Apollonius, an equation takes half a page to write 
out, it is difficult to form a general concept of equation, function, or curve. 
Hence the lack of a general concept of curve in Greek mathematics—it 
was just too complicated to handle in their language. 

Also lacking was an appreciation of coordinates in geometry. Coordi- 
nates had been used in astronomy and geography since Hipparchus (around 
150 Bce); but they were not used to describe functions or curves until the 
Middle Ages, in the work of Oresme (around 1323-1382). Oresme still 
called the coordinates “longitude” and “latitude,” but he used them to rep- 
resent functions such as velocity as a function of time. Setting up the coor- 
dinate system before determining the curve was Oresme’s step beyond the 
Greeks, but he too lacked the algebra to go further. 

The step that finally made algebraic geometry feasible was the solution 
of equations and the improvement of notation in the 16th century, which 
we discussed in the previous chapter. This step made it possible to consider 
equations, and hence curves, in some generality and to manipulate them 
fluently. As we will see in the next section, the two founders of algebraic 
geometry, Fermat and Descartes, both exploited these developments. 

For more on the early history of algebraic geometry, see the book 
Boyer (1956) and the first chapter of Brieskorn and Knorrer (1981). 
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It may be worth mentioning, at this point, that this kind of geome- 
try has traditionally been called “analytic” rather than “algebraic.” We are 
calling the geometry “algebraic” to emphasize that its objects are alge- 
braically defined (by polynomial equations) and that they are investigated 
by methods of algebra. This is also in line with modern use of the term 
“algebraic geometry.” The methods of analysis come into play only later, 
particularly for curves defined by nonalgebraic means, which Descartes 
called “mechanical.” The term “analytic geometry” is better employed for 
the latter, more general, kind of geometry. 


EXERCISE 


6.1.1 Generalize the idea of Menaechmus to show that any cubic equation 


ax +bx°+cx+d=0 with d#0 


may be solved by intersecting the hyperbola xy = 1 with a parabola. 


6.2 Fermat and Descartes 


There have been several occasions in the history of mathematics when an 
important discovery was made independently and almost simultaneously 
by two individuals: non-Euclidean geometry by Bolyai and Lobachevsky, 
elliptic functions by Abel and Jacobi, the calculus by Newton and Leibniz, 
for example. To the extent that we can rationally explain these remark- 
able events, it is on the basis of ideas already “in the air,’ and condi- 
tions becoming favorable for their precipitation. As I tried to show in 
the previous section, conditions were favorable for algebraic geometry at 
the beginning of the 17th century. So it is not completely surprising that 
the subject was independently discovered by Fermat (1629) and Descartes 
(1637). (Descartes’s work La Géométrie may in fact have been started in 
the 1620s. In any case it is independent of Fermat, whose work was not 
published until 1679.) 

It is surprising, however, that both Fermat and Descartes began with an 
algebraic solution of the same classical geometric problem, the so-called 
“four-line problem” of Apollonius, and that the main discovery of each 
was that second-degree equations correspond to conic sections. Up to this 
stage Fermat was more systematic than Descartes, but that was as far as 
he went. He was content to leave his work in a “simple and crude” state, 
confident that it would grow in stature when nourished by new inventions. 
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Descartes, on the other hand, treated many higher-degree curves and 
clearly understood the power of algebraic methods in geometry. He wanted 
to withhold this power from his contemporaries, however, particularly the 
rival mathematician Roberval, as he admitted in a letter to Mersenne (see 
Boyer (1956), p. 104). La Géométrie was written to boast about his dis- 
coveries, not to explain them. There is little systematic development, and 
proofs are frequently omitted with a sarcastic remark such as, “I shall not 
stop to explain this in more detail, because I should deprive you of the plea- 
sure of mastering it yourself” (p. 10). Descartes’s conceit is so great that 
it is a pleasure to see him come a cropper occasionally, as on p. 91: “The 
ratios between straight and curved lines are not known, and I believe can- 
not be discovered by human minds.” He was referring to the then-unsolved 
problem of determining the length of curves, but he spoke too soon, for in 
1657 Neil and van Heuraet found the length of an arc of the semicubi- 
cal parabola y? = x°, and the calculus soon made such problems routine. 
(A full and interesting account of the story of arc length may be found in 
Hofmann (1974), Ch. 8.) 

EXERCISES 

As we now know, all conic sections may be given by the following standard 

form equations (from Section 2.4): 

er ¥ : 2 2 

a + BR = | (ellipse), y = ax’ (parabola), 2 Rp = | (hyperbola). 
The reduction of an arbitrary quadratic equation in x and y to one of these forms 
depends on suitable choice of origin and axes, as Fermat and Descartes discov- 
ered. The main steps are outlined in the following exercises. 


6.2.1 Show that a quadratic form ax* + bxy + cy” may be converted to a form 


a’x’? + b'y’”” by suitable choice of 6 in the substitution 


x=x'cos@—y’ sing, 
y= x’ sin@ + y’ cos@, 
by checking that the coefficient of x’y’ is (c — a) sin 26 + bcos 20. 


6.2.2 Deduce from Exercise 6.2.1 that, by suitable rotation of axes, any quadratic 
curve may be expressed in the form a’x’? + b'y’? +c’x’ +.d'y' +e =0. 


6.2.3 If b’ = 0, but a’ # 0, show that the substitution x’ = x’ + f gives either a 
standard-form parabola, or the “double line” x”? = 0. 
(Why is this called a “double line,” and is it a section of a cone?) 


6.2.4 If both a’ and b’ are nonzero, show that a shift of origin gives the standard 
form for either an ellipse or a hyperbola, or else a pair of lines. 
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6.3 Algebraic Curves 


I could give here several other ways of tracing and conceiving 
a series of curved lines, each curve more complex than any 
preceding one, but I think the best way to group together all 
such curves and then classify them in order is by recogniz- 
ing the fact that all points of those curves which we may call 
“geometric,” that is, those which admit of precise and exact 
measurement, must bear a definite relation to all points of a 
straight line, and that this relation must be expressed by means 
of a single equation. 

Descartes (1637), p.48 


In this passage Descartes speaks of what we now call algebraic curves. 
The fact that he calls them “geometric” shows his attachment to the Greek 
idea that curves are the product of geometric constructions. He is using the 
notation of equations not to define curves directly but to restrict the notion 
of geometric construction more severely than the Greeks did, thereby 
restricting the concept of curve. As we saw in Section 2.5, the Greeks con- 
sidered some constructions, such as rolling one circle on another, that can 
produce transcendental curves. Descartes called such curves “mechani- 
cal” and found a way to exclude them by his restriction to curves “expressed 
by means of a single equation.” It becomes clear in the lines following the 
preceding quotation that he means polynomial equations, since he gives a 
classification of equations by degree. 

Descartes’s rejection of transcendental curves was short-sighted, since 
the calculus soon provided techniques to handle them, but nevertheless it 
was fruitful to concentrate on algebraic curves. The notion of degree, in 
particular, was a useful measure of complexity. First-degree curves are the 
simplest possible, namely, straight lines. Those of second degree are the 
next simplest, conic sections. With third-degree curves one sees the new 
phenomena of inflections, double points, and cusps. Inflection and cusp 
are familiar from y = x° and y* = x°, respectively; we also saw a cusp 
on the cissoid (Section 2.5). A classical example of a cubic with a double 
point is the folium (leaf) of Descartes (1638), 


xty? = 3axy. 


The “leaf” is the closed portion in the positive quadrant; Descartes missed 
the rest of the curve by ignoring negative coordinates. The complete shape 
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of the folium was first given by Huygens (1692). Figure 6.1 is Huygens’s 
drawing, which also shows the asymptote to the curve. (A more accurate 
picture of the folium is Figure 10.2.) 


Figure 6.1: Huygens’s drawing of the folium 


An excellent account of the early history of curves can be found in 
Brieskorn and Knorrer (1981), Chapter 1. Many individual curves, with 
diagrams, equations, and historical notes, can be found in Gomes Teixeira 
(1995a,b,c). The development of Descartes’s concept of curve has been 
studied by Bos (1981). 


EXERCISES 


The folium is a cubic curve to which Diophantus’s chord method (Section 3.5) 
applies. One takes the line y = tx through the “obvious” rational point (0,0) on 
the curve, and finds its other point of intersection. This construction also enables 
us to express an arbitrary point (x, y) on the curve in terms of the parameter f. 


6.3.1 Show that the folium of Descartes has parametric equations 


3at 3at? 
x= —, = —_— 
1+ y 1+ 


and use these equations to show that it is tangential to the axes at 0. 


6.3.2, Show that the equation x° + y* = 3axy of the folium may be written in the 
form 


X+Yy= 


6.3.3 Show that x/y and y/x tend to —1 as x — +oo on the folium, and hence 
deduce the equation of its asymptote from Exercise 6.3.2. 
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A whole family of “multileaved” curves was studied by Grandi (1723). 


6.3.4 The roses of Grandi are given by the polar equations 


r=acosnd 


for integer values of n. Figure 6.2 shows some of these curves, as given by 
Grandi (1723). Show that the roses of Grandi are algebraic. 


Figure 6.2: Roses of Grandi 


6.3.5 Show that the “rose” for n = 1 is a circle and that the “rose” for n = 2 has 
cartesian equation 
(2 + yy = ae om y?)?. 


6.4 Newton’s Classification of Cubics 


Since first- and second-degree curves are straight lines and conics, they 
were well understood before the advent of algebraic geometry. Up to the 
end of the 18th century most mathematicians considered them as clear as 
could be, and hence an unsuitable subject for the new methods. A famous 
example is the Greek-style treatment of planetary orbits in Newton’s Prin- 
cipia (1687). The classical attitude to low-degree curves was summed up 
by d’Alembert in his article on geometry in the great French Encyclopédie 
(p. 637 of volume 7, 1757): 


Algebraic calculation is not to be applied to the propositions 
of elementary geometry because it is not necessary to use this 
calculus to facilitate demonstrations, and it appears that there 
are no demonstrations which can really be facilitated by this 
calculus except for the solution of problems of second degree 
by the line and circle. 
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Thus the first new problem opened up by algebraic geometry, and also 
the first considered properly to belong to the subject, was the investigation 
of cubic curves. These curves were classified, more or less completely, by 
Newton (1695) (see Ball (1890) for a commentary). 

Newton (1667) began this work with the general cubic in x and y, 


ay? + bxy* + cx’y + dx? + ey’ + fxy + 9x + hy +kx+1=0, 
made a general transformation of axes—which gives an equation with 84 


terms (!)—then showed that the equation could be reduced to one of the 
forms 


Axy’ + By = Cx? + Dx’ + Ex + F, 
xy = Ax? + Bx? +Cx+D, 
y? = Ax? + Bx +Cx+D, 
y= Ax? + Bx’ +Cx+D. 


Newton then divided the curves into species according to the roots of the 
right-hand side, obtaining 72 species (and overlooking 6). His paper lacks 
detailed proofs; these were supplied by Stirling (1717), along with four 
of the species Newton missed. Newton’s classification was criticized by 
some later mathematicians, such as Euler, for lacking a general organizing 
principle. But such a principle was already implicit in one of Newton’s 
passing remarks, Section 29, “On the Genesis of Curves by Shadows.” 
This principle, which will be explained in the next chapter, reduces cubics 
to the five types seen in Figure 6.3 (taken from an English translation of 
Newton’s paper in Harris (1708); see Whiteside (1964), p. 158). 

The reader may wonder where the most familiar cubic, y = x°, appears 
among these five. The answer is that it is equivalent to the one with a cusp, 
in Newton’s Figure 75. This is explained in the next chapter. 


EXERCISES 


The cubic curves that Newton called “cuspidate” and “nodated” are alge- 
braically simpler than the others. In particular, they can be parameterized by ratio- 
nal functions. 


6.4.1 Find a parameterization x = p(t), y = q(t) of the semicubical parabola 
y” = x° by polynomials p and gq, (i) by inspection, (ii) by finding the second 
intersection point of the line y = tx through the cusp (0, 0). 

6.4.2. Find rational functions x = r(t), y = s(t) that parameterize y* = x?(x + 1), 


by finding the second intersection of the line y = tx through the double 
point of the curve. 
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Fig. 71. or Punftate, by having the Oval infinitely {mall. 
Which two Species are the Sixty eighth and Sixty 
é ninth, 
If three of the Roots are equal, the Parabola 
c will be Cu/pidate at the Vertex. And this is the 
B Fig. 75. 
af 
of the Formof a Bell, with an Oval at irs Vertex. |B 
And this makes a Sixty feventh Species, 
If two of the Roots are equal, a Parabola will 
be formed, either Nodated by touching an Oval, 
Fig. 72. \ 
G / 
yA Neilian Parabola, commonly called Semi-cubical. 
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Figure 6.3: Newton’s classification of cubic curves 


94 6 Algebraic Geometry 


6.5 Construction of Equations, Bézout’s Theorem 


In Sections 6.1, 6.2, and 6.3 the development of algebraic geometry is 
outlined from the first observations of equations as properties of curves 
to the full realization that equations define curves and that the concept of 
(polynomial) equation is the key to the concept of (algebraic) curve. With 
hindsight, we can say that Descartes’s La Géométrie (1637) was a major 
step in the maturation of the subject, but the book does not conclusively 
establish what algebraic geometry is. In fact, it is largely devoted to two 
transitional topics in the development of the subject: the 16th-century the- 
ory of equations and the now almost forgotten discipline called “construc- 
tion of equations.” 

The paradigm for construction of an equation was Menaechmus’s con- 
struction of V2 by intersecting a parabola and hyperbola. From a geo- 
metric point of view, one is using familiar curves (parabola and hyper- 
bola) to construct a less familiar length (V2). This becomes sharper when 
expressed algebraically: curves of degree 2 are being used to solve an 
equation of degree 3, x? = 2. In the 1620s Descartes discovered some- 
thing more general: a method of solving any third- or fourth-degree equa- 
tion by intersecting curves of degree 2, a parabola and a circle. His friend 
Beeckman (1628) reported in a note that “M. Descartes made so much 
of this invention that he confessed never to have found anything superior 
himself and even that nobody else had ever found anything better” (trans- 
lation by Bos (1981), p. 330). Descartes was not as superior as he thought, 
since Fermat (1629) independently made the same discovery in an unpub- 
lished work, strengthening the already extraordinary coincidence between 
his work and that of Descartes. However, Fermat apparently did not pursue 
the idea further, and Descartes did. 

In La Géométrie Descartes found a particular cubic curve, the so- 
called cartesian parabola, whose intersections with a suitable circle yield 
the solution of any given fifth- or sixth-degree equation. Descartes con- 
cludes the book with this result, blithely telling the reader that 


it is only necessary to follow the same method to construct 
all problems, more and more complex, ad infinitum; for in the 
case of a mathematical progression, whenever the first two or 
three terms are given, it is easy to find the rest. 


Descartes (1637), p. 240 
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In reality it was not easy, and efforts to find a satisfactory general construc- 
tion for nth-degree equations petered out around 1750. The story of the rise 
and fall of this field of mathematics has been told by Bos (1981, 1984). 
In their search for a general construction, mathematicians had casually 
assumed that a curve of degree m meets a curve of degree n in mn points. 
The first statement of this principle, which became known as Bézout’s 
theorem, seems to have been made by Newton on May 30, 1665: 


For y® number of points in w*" two lines may intersect can 
never bee greater y” y° rectangle of y° numbers of their dimen- 
sions. And they always intersect in soe many points, excepting 
those w are imaginarie onely. 


Newton (1665b), p. 498 
Bézout’s theorem leads one to hope that solutions of an equation r(x) = 0 


of degree k = m-n might result from the intersections of a degree m curve 
with a degree n curve. In algebraic terms, one seeks equations 


p(x,y) = 0, () 
q(x, y) = 0 (2) 

of degrees m, n respectively, which yield the given equation 
r(x) = 0 (3) 


as “resultant” by elimination of y. This is how mathematicians in the West 
first encountered the problem of elimination, which the Chinese had solved 
some centuries earlier (Section 5.2). 

However, quite apart from the fact that construction of equations was 
inverse to elimination, and much harder, two more facts about elimination 
itself were needed: first, that elimination between equations of degrees m 
and n gave a resultant of degree mn; second, that an equation of degree mn 
has mn roots. The second statement, as mentioned in Section 5.7, becomes 
a fact only when complex numbers are admitted. The first becomes a fact 
only when “points at infinity” are admitted. If, for example, (1) and (2) are 
equations of parallel lines, then (3) is of “degree 0” and has no solutions. 
However, one can say that parallel lines meet “at infinity,” and the geomet- 
ric framework for this idea, projective geometry, developed at about the 
same time as algebraic geometry. Unfortunately, it was not realized until 
the 19th century that projective geometry and algebraic geometry needed 
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each other. Until then, projective geometry developed without coordinates, 
and all attempts to prove Bézout’s theorem—notably by Maclaurin (1720), 
Euler (1748b), Cramer (1750), and Bézout (1779)—foundered for want of 
a proper method for counting points at infinity. As a result, Bézout’s the- 
orem, which turned out to be the main achievement of the theory of con- 
struction of equations, was not properly proved until long after the theory 
itself had been abandoned. 

The origins of projective geometry, and the fruits of its merger with 
algebraic geometry, are discussed in Chapter 7. 


EXERCISES 


We know from Section 5.7 that an arbitrary quartic equation is equivalent to 
one of the form 
x + px +qxt+r=0. 


6.5.1 Show that any such equation may be solved by finding the intersection 
of the parabola y = x* with another quadratic curve (hence with a conic 
section). 


6.5.2, Find two parabolas whose intersections give the solutions of x* = x + 1, 
and hence show that this quartic equation has two real roots. 


6.6 The Arithmetization of Geometry 


We have stressed that early algebraic geometers—Descartes in particular— 
did not accept that geometry could be based on numbers or algebra, even 
though their work led eventually to this conclusion. Perhaps the first to 
take the idea of arithmetizing geometry seriously was Wallis (1616-1703). 
Wallis (1657), Chs. XXIII and XXV, gave the first arithmetic treatment of 
Euclid’s Books II and V, and Wallis (1655b) had earlier given the first 
purely algebraic treatment of conic sections. He initially derived equa- 
tions from the classical definitions by sections of the cone but then pro- 
ceeded conversely to derive their properties from the equations, “without 
the embranglings of the cone,” as he put it. 

Wallis was ahead of this time. Thomas Hobbes, introduced at the begin- 
ning of Chapter 2, described Wallis’s treatise on conics as a “scab of sym- 
bols” and denounced “the whole herd of them who apply their algebra of 
geometry” (Hobbes (1656), p. 316, and Hobbes (1672), p. 447). The exam- 
ple and authority of Newton probably reinforced the opinion that algebra 
was inappropriate in the geometry of lines or conic sections; we saw in 
Section 6.4 how this remained the accepted view until at least 1750. 
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Algebra did not catch on in elementary geometry until it was taken up 
by Lagrange and supported by influential textbooks of Monge and Lacroix 
around 1800. But by the time elementary geometry had been brought into 
the theory of equations, higher geometry had broken out, depending more 
and more on calculus and the emerging theories of complex functions, 
abstract algebra, and topology, which bloomed in the 19th century. Higher 
geometry broke away to form the separate fields of differential geometry 
and algebraic geometry, leaving the elementary residue we call “analytic 
geometry” today. 

Despite its lowly status, analytic geometry was given an important 
foundational role by Hilbert (1899). Hilbert took Wallis’s arithmetization 
to its logical conclusion by assuming only the real numbers and sets as 
given and constructing Euclidean geometry from them. 

Thus from the set R of reals, one constructs the Euclidean plane as 
the set of ordered pairs (x, y) (“points”) where x,y € R. A straight line 
is a set of points (x,y) in the plane such that ax + by + c = 0 for some 
constants a, b,c. Lines are parallel if their x and y coefficients are propor- 
tional. The distance between points (x;, y,) and (x2, y2) is defined to be 
(x2 — x1)* + (y2 — y1)?. This definition is motivated by the Pythagorean 
theorem, which is the keystone in the bridge from arithmetic to geometry. 

With these definitions, all axioms and propositions of Euclid’s geome- 
try become provable propositions about equations. For example, the axiom 
that nonparallel lines have a point in common corresponds to the theorem 
that linear equations 


ayx+byy+c, =0, 


anx + boy +c. = 0 


have a solution when a,b> — bia2 # 0. 

Hilbert did not believe, any more than Newton did, that numbers were 
the true subject matter of geometry. He supported geometric intuition as a 
method of discovery, as the book Hilbert and Cohn- Vossen (1932) makes 
clear. The purpose of arithmetization was to give a secure logical foun- 
dation to geometry after the 19th-century developments that discredited 
geometry and installed arithmetic as the ultimate authority in mathemat- 
ics. This foundation is no longer quite as secure as it seemed in 1900, 
as we will see in Chapter 17; nevertheless, it is still the most secure and 
convenient foundation for the many branches of geometry and analysis. 
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Projective Geometry 


PREVIEW 


At about the same time as the algebraic revolution in classical geometry, 
a new kind of geometry also came to light: projective geometry. Based on 
the idea of projecting objects from space to a plane, or from one plane 
to another, projective geometry was initially the concern of artists. In the 
17th century, only a handful of mathematicians were interested in it, and 
their discoveries were not seen to be important until the 19th century. 

The fundamental quantities of classical geometry, such as length and 
angle, are not preserved by projection, so they have no meaning in pro- 
jective geometry. Projective geometry can discuss only things that are pre- 
served by projection, such as points and lines. 

Surprisingly, there are nontrivial theorems about points and lines. One 
was discovered by the Greek geometer Pappus around 300 ce, and another 
by the French mathematician Desargues around 1640. 

Even more surprisingly, there is a numerical quantity preserved by pro- 
jection. It is a “ratio of ratios” of lengths called the cross-ratio. In projec- 
tive geometry, the cross-ratio plays a role similar to that played by length 
in classical geometry. 

One of the virtues of projective geometry is that it simplifies the clas- 
sification of curves. All conic sections, for example, are “projectively the 
same,” and there are only five types of cubic curve. 

The projective viewpoint also removes some apparent exceptions to 
the theorem of Bézout. For example, a line (curve of degree 1) always 
meets another line in exactly one point, because in projective geometry 
even parallel lines meet. 
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7.1 Perspective 


Perspective may be simply described as the realistic representation of spa- 
tial scenes on a plane. This of course has been a concern of painters since 
ancient times, and some Roman artists seem to have achieved correct per- 
spective by the first century BcE; an impressive example is shown in Wright 
(1983), p.38. However, the vast majority of ancient paintings show in- 
correct perspective. If there was ever a classical theory of perspective, it 
was well and truly lost during the Dark Ages. Medieval artists made some 
charming attempts at perspective but always got it wrong. See Figure 7.1, 
for example, which is in The Lives of Sts. Edmund and Fremund by John 
Lydgate, from around 1434, now in the British Library. 


Figure 7.1: Errors in perspective 


The first correct perspective method is usually attributed to the Floren- 
tine painter—architect Brunelleschi (1377-1446), around 1420. The first 
published method appears in the treatise On Painting by Alberti (1436). 
The latter method, which became known as Alberti’s veil, used a piece 
of transparent cloth fixed in front of the scene to be painted. Then, view- 
ing the scene with one eye, in a fixed position, one could trace the scene 
directly onto the veil. Figure 7.2 shows this method, with a peephole to 
maintain a fixed eye position, as depicted by Diirer (1525). 
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Figure 7.2: Diirer’s depiction of Alberti’s veil 


Alberti’s veil was fine for painting actual scenes, but to paint an imag- 
inary scene in perspective some theory was required. The basic principles 
Renaissance artists used were the following: 


(i) A straight line in perspective remains straight. 


(ii) Parallel lines either remain parallel or converge to a single point 
(their vanishing point). 


These principles suffice to solve a problem artists frequently encountered: 
the perspective depiction of a square-tiled floor. Alberti (1436) solved the 
special case of this problem in which one set of floor lines is horizontal, 
that is, parallel to the horizon. Alberti’s method is shown in simplified 
form in Figure 7.3. The receding floor lines begin at points equally spaced 
along the base line (imagined to touch the floor) and end at a vanishing 
point on the horizon. The horizontal floor lines are then determined by 
choosing one of them arbitrarily, thus determining one tile in the floor, and 
then producing the diagonal of this tile to the horizon. The intersections 
of this diagonal with the receding lines are the points through which the 
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horizontal lines pass. This is certainly true on the actual floor (Figure 7.4); 
hence it remains true in the perspective view. 


Figure 7.3: Alberti’s method 


Figure 7.4: The actual floor 


EXERCISES 


In almost all paintings of tiled floors, one set of lines is parallel to the horizon. 
However, the principles (i) and (ii) suffice to generate a perspective view of a 
tiled floor given an arbitrarily situated tile, and they show that no measurement is 
needed to achieve correct spacing along the base line in Alberti’s method. 


7.1.1 Use the lines shown in Figure 7.5 to determine all lines in a pavement gen- 
erated by the given tile one by one. (Hint: All the diagonals are parallel.) 


Figure 7.5: Tiled floor with arbitrary orientation 
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7.1.2 By using diagonals as in Exercise 7.1.1, show how to generate the lines in 
the tiling when the baseline is parallel to the horizon, without making any 
measurements. 


7.2 Anamorphosis 


It is clear from the Alberti veil construction that a perspective view will 
not look absolutely correct except when seen from the artist’s viewpoint. 
Experience shows, however, that distortion is not noticeable except from 
extreme viewing positions. Following the mastery of perspective by the 
Italian artists, an interesting variation developed, in which the picture looks 
right from only one, extreme, viewpoint. The first known example of this 
style, known as anamorphosis, is an undated drawing by Leonardo da 
Vinci from the Codex Atlanticus (compiled between 1483 and 1518). 
Figure 7.6 shows part of this drawing, a child’s face which looks correct 
when viewed with the eye near the right-hand edge of the page. 


Figure 7.6: Leonardo’s drawing of a face 


The idea was taken up by German artists around 1530, famously in 
Holbein’s painting The Two Ambassadors from 1533. A mysterious streak 
across the bottom of the picture becomes a skull when viewed from near 
the picture’s edge (Figure 7.7). For more on history of anamorphosis, see 
BaltruSaitis (1977) and Wright (1983), pp. 146-156. The art of anamor- 
phosis reached its technically most advanced form in France in the early 
17th century. It seems no coincidence that this was also the time and place 
of the birth of projective geometry. In fact, key figures in the two fields, 
Niceron and Desargues, were well aware of each other’s work. 

Niceron (1613-1646) was a student of Mersenne and, like him, a monk 
in the order of Minims. He executed some extraordinary anamorphic wall 
paintings, up to 55 meters long, and also explained the theory in La per- 
spective curieuse (1638). Figure 7.8 is his illustration of anamorphosis of 
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Figure 7.7: Skull in Holbein’s “Two Ambassadors” and its perspective 
view. (Pictures courtesy of Wikimedia) 


a chair (for other examples, see BaltruSaitis (1977)). Viewed normally, the 
chair is like none ever seen, yet from a suitably extreme point one sees an 
ordinary chair in perspective. Thus the ordinary view is a perspective view 
of the extraordinary view. 


Figure 7.8: Niceron’s chair 


This example exposes an important mathematical fact: the inverse of a 
perspective view is not in general a perspective view. Iteration and inver- 
sion of perspective views gives what we now call a projective view, and 
Niceron’s chair shows that projectivity is a broader concept than perspec- 
tivity. As a consequence, projective geometry, which studies properties 
invariant under projection, is broader than the theory of perspective. Per- 
spective itself became a mathematical theory, called descriptive geometry, 
only at the end of the 18th century. 
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7.3 Desargues’s Projective Geometry 


The mathematical setting in which one can understand Alberti’s veil is the 
family of lines (“light rays”) through a point (the “eye”), together with 
a plane V (the “veil’”) (Figure 7.9). In this setting, the problems of per- 
spective and anamorphosis were not very difficult, but the concepts were 
interesting and a challenge to traditional geometric thought. Contrary to 
Euclid, one had the following: 


(i) Points at infinity (“vanishing points”) where parallels met. 


(ii) Transformations that changed lengths and angles (projections). 


Figure 7.9: Seeing through Alberti’s veil 


The first to construct a mathematical theory incorporating these ideas 
was Desargues (1591-1661), although the idea of points at infinity had 
already been used by Kepler (1604), p. 93. The book of Desargues (1639), 
Brouillon projet d’une atteinte aux événemens des rencontres du c6ne avec 
un plan (Schematic Sketch of What Happens When a Cone Meets a Plane), 
suffered an extreme case of delayed recognition, being completely lost 
for 200 years. Fortunately, his two most important theorems, the so-called 
Desargues’s theorem and the invariance of the cross-ratio, were published 
in a book on perspective, Bosse (1648). The text of Desargues (1639) and 
a portion of Bosse (1648) containing Desargues’s theorem may be found 
in Taton (1951). An English translation, with an extensive historical and 
mathematical analysis, is in Field and Gray (1987). 
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Kepler and Desargues both postulated one point at infinity on each 
line, closing the line to a “circle of infinite radius.” All the lines in a family 
of parallels share the same point at infinity. Nonparallel lines, having a 
finite point in common, do not have the same point at infinity. Thus any 
two distinct lines have exactly one point in common—a simpler axiom 
than Euclid’s. Strangely enough, the line at infinity was only introduced 
into the theory by Poncelet (1822), even though it is the most obvious 
line in perspective drawing, the horizon. Desargues made extensive use of 
projections in the Brouillon projet; he was the first to use them to prove 
theorems about conic sections. 

Desargues’s theorem is a property of triangles in perspective illustrated 
by Figure 7.10. The theorem states that the points X, Y, Z at the intersec- 
tions of corresponding sides lie on a line. This is obvious if the triangles 
are in space, since the line is the intersection of the planes containing them. 
The theorem in the plane is subtly but fundamentally different and requires 
a separate proof, as Desargues realized. In fact, Desargues’s theorem was 
shown to play a key role in the foundations of projective geometry by 
Hilbert (1899). 


Z 


Figure 7.10: Desargues’s theorem 


The second theorem of Desargues, invariance of the cross-ratio, was 
already known to the Greek mathematician Pappus, around 300 ce. It is 
Proposition 129 in his Collection Book VII, available in English transla- 
tion in Pappus (1986). The theorem was rediscovered by Desargues and 
it answers a natural question about perspective raised by Alberti: since 
length and angle are not preserved by projection, what is? 

No property of three points on a line can be invariant because any 
three points on a line can be projected to any three others (Exercise 7.3.1). 
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At least four points are therefore needed, and the cross-ratio is indeed a 
projective invariant of four points. If A, B, C, D are four points on a line (in 
that order) then their cross-ratio (ABCD) is oS / pe Its invariance is most 
simply seen by reexpressing it in terms of angles using Figure 7.11. 


Figure 7.11: Evaluating the cross-ratio 


Let O be any point outside the line and consider the areas of the tri- 
angles OCA, OCB, ODA, and ODB. First compute them from bases on 
AB and height h, then recompute using OA and OB as bases and heights 
expressed in terms of the sines of angles at O: 


xh -CA = area OCA = SOA - OC sin ZCOA, 


1 1 
5h -CB = area OCB = al OC sin ZCOB, 


sh - DA = area ODA = SOA -ODsin ZDOA, 


sh - DB = area ODB = 5OB- OD sin ZDOB. 


Substituting the values of CA, CB, DA, and DB from these equations we 
find, following Mobius (1827), the cross-ratio in terms of angles at O: 


CA|DA 7 sin ZCOA |sin ZDOA 
CB| DB  sinZCOB| sinZDOB’ 


Any four points A’, B’, C’, D’ in perspective with A, B, C, D from a point 
O have the same angles (Figure 7.11); hence they will have the same cross- 
ratio. But then so will any four points A’’, B’’, C’’, D” projectively related 
to A, B, C, D, since a projectivity is by definition the product of a sequence 
of perspectivities. 
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EXERCISES 


As mentioned above, we cannot hope for an invariant that is simpler than the 
cross-ratio, because any three points in a line are projectively related to any other. 


7.3.1 Show that any three points on a line can be sent to any other three points 
on a line by projection. (You may move the lines to a convenient position.) 


7.4 The Projective View of Curves 


The first works in projective geometry, by Desargues (1639) and Pascal 
(1640), used the language of classical geometry, even though the language 
of equations was available from Descartes (1637). At that time the ad- 
vantages of the projective method were more clearly seen in a classical 
setting. Desargues and Pascal confined themselves to straight lines and 
conic sections, showing how projective geometry could easily reach and 
surpass the results obtained by the Greeks. Moreover, the projective view- 
point gave something else that would have been incomprehensible to the 
Greeks: a clear account of the behavior of curves at infinity. 

For example, Desargues (1639) (in Taton (1951), p. 137) distinguished 
the ellipse, parabola, and hyperbola by their numbers of points at infinity: 
0, 1, and 2, respectively. The points at infinity on the parabola and hyper- 
bola can be seen quite plainly by tilting the ordinary views of them into 
perspective views (Figures 7.12 and 7.13). The parabola has just one point 
at infinity because it crosses each ray through 0, except the y-axis, at one 
other point. 


>< 
>< 


>X >X 


Figure 7.12: The parabola: direct and perspective view 
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As for the hyperbola, its two points at infinity are where it touches 
its asymptotes, as seen in Figure 7.13. The continuation of the hyperbola 
above the horizon results from projecting the lower branch through the 
same center of projection (Figure 7.14). 


y 


A 


y 


Figure 7.13: The hyperbola: direct and perspective view 


Figure 7.14: Branches of the hyperbola 


Projective geometry goes beyond describing the behavior of curves at 
infinity. The line at infinity is no different from any other line and can 
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be deprived of its special status. Then all projective views of a curve are 
equally valid and one can say, for example, that all conic sections are el- 
lipses when suitably viewed. This is no surprise if one thinks of conic sec- 
tions not as second-degree curves but as sections of the cone. Of course 
they all look the same from the vertex of the cone! 


Cubic Curves 


More surprisingly, a great simplification of cubic curves also occurs when 
they are viewed projectively. As mentioned in Section 6.4, Newton (1695) 
classified cubic curves into 72 types (and missed 6). However, in his 
Section 29, “On the Genesis of Curves by Shadows,” Newton claimed that 
each cubic curve can be projected onto one of just five types. As mentioned 
in Section 6.4, this includes the result that y = x° can be projected onto 
y’ = x°. The proof of this is an easy calculation when coordinates are in- 
troduced (see Exercise 7.7.2), but one already gets an inkling of it from the 
perspective view of y = x°. See Figure 7.15. The lower half of the cusp is 
the view of y = x° below the horizon; the upper half comes from projecting 
the view behind one’s head through the eye to the picture plane in front. 


Xx 


Figure 7.15: Perspective view of y = x° 


Conversely, y* = x° has an inflection at infinity. Newton’s projective 
classification comes from studying the behavior at infinity of all cubics and 
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observing that each has characteristics already possessed, not necessarily 
at infinity, by curves of the form 


yf =A eB CKD, 


Newton had already divided these into five types in his analytic classifica- 
tion. They are the five shown in Figure 6.3. Newton’s result was improved 
only in the 19th century, when projective classification over the complex 
numbers reduced the number of types of cubics to just three. We discuss 
this later in connection with complex numbers (Section 12.5). 


EXERCISES 


As suggested above, the points at infinity of a curve may be counted by con- 
sidering intersections of the curve with lines through the origin, and observing 
where they tend to infinity. 


7.4.1 Use this method to explain why 


e the hyperbola xy = | has two points at infinity, 


e the curve y = x° has one point at infinity. 


Figures 7.12 and 7.13 were made by taking Alberti’s veil to be the (x, z)- 
plane in (x, y, z)-space, with the “eye” at (0, —4, 4) viewing the (x, y)-plane tiled 
with unit squares. 


7.4.2 Find the parametric equations of the line from (0, —4, 4) to (x’, y’, 0), and 
hence show that this line meets the veil where 


4x’ Ay’ 
y +4’ 7 y +4 


7.4.3, Renaming the coordinates x, z in the veil as X, Y respectively, show that 


/ 4x , 


4Y 
x= » Y = 


4-Y' 


7.4.4 Deduce from Exercise 7.4.3 that the points (x’, y’) on the parabola y = x? 
have image on the veil 
y-2) 
(yay. 


Xx? + , 
4 


and check that this is the ellipse shown in Figure 7.12. 
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7.5 The Projective Plane 


The way projective geometry puts infinity on the same footing as the finite 
points of the plane is intuitively clear when one thinks of the horizon in a 
picture, which is a line like any other. But what, mathematically speaking, 
is this line we see? To model the situation we take the plane in view to be 
the plane z = —1 in the three-dimensional space with coordinates (x, y, z), 
and place our eye at the origin O = (0,0, 0), as in Figure 7.16. 


z 


A 


Figure 7.16: Viewing the plane 


Points P;, P2, P3,... in the plane lie on “lines of sight” £,, £2, £3,... 
through O, and as the point P,, tends to infinity its line of sight L,, tends to 
horizontal. Therefore, it is natural to interpret each horizontal line through 
O, which does not correspond to an actual point of the plane, as the line 
of sight to a “point at infinity” of the plane. More boldly, we can define 
the lines through O to be the points of a projective plane, called the real 
projective plane RP”, and the planes through O to be the lines of RP7—the 
so-called projective lines. 

Modeling the points of the plane z = —1 by the non-horizontal lines 
through O enables us to complete this ordinary plane to a projective plane 
by using the remaining lines through O (which are not called “horizontal” 
for nothing!) to model the points on its horizon. Moreover, the horizontal 
plane through O models the horizon line, reinforcing our intuition that the 
horizon is a line like any other. 
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This model of the projective plane is geometrically as natural as one 
could wish, and it answers certain questions that are confusing for vision 
alone. For example, we can see why it is proper for a line M in the ordinary 
plane to have only one point at infinity: because there is only one line 
through O to which the lines through P;, P2, P3,... tend as P,, tends to 
infinity, namely, the parallel to M through O. Thus, Kepler and Desargues 
were not far wrong in thinking of a projective line as a circle. The two 
“ends” of the line are joined by its single point at infinity. 

While a projective line is essentially a circle, a projective plane is not 
essentially a sphere, but something more peculiar, as was noticed by Klein 
(1874). RP? is essentially a sphere with antipodal points identified, where 
antipodal points P, P’ are pairs such as those shown in Figure 7.17: the 
diametrically opposite points at which a line through O meets the unit 
sphere with center O. “Identifying” the points P, P’ means treating the pair 
(P, P’) as a single point. This is appropriate since the pair corresponds to a 
single line through O, that is, to a single point of RP’. 


Figure 7.17: Antipodal point pair 


The surface RP* modeled by the pairs (P, P’) is strikingly different 
from the sphere of individual points P. For example, on a sphere, any 
simple closed curve separates the surface into two parts. A “small” closed 
curve in RP?—that is, one strictly contained in a hemisphere of the model— 
also separates it, but a “large” one may not. The equator, for instance, 
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does not separate the upper hemisphere from the lower, because the hemi- 
spheres are the same place under antipodal point identification! A less 
paradoxical view of this is seen by going back to the model of RP” whose 
elements are lines through O. The lines through the equator do not separate 
the lines through the upper hemisphere from the lines through the lower 
hemisphere, because these are the same lines. 


EXERCISES 
The model of the projective plane whose points are lines through O and whose 
lines are planes through O also helps in visualizing other basic properties of pro- 


jective lines. 


7.5.1 Use this interpretation of projective lines to show that all lines in a family 
of parallels have the same point at infinity. 


7.5.2 Likewise, show that any two projective lines meet in exactly one point. 


Now let us return to the interpretation of the projective plane as a surface, the 
sphere with antipodal points identified. The following result shows another way 
in which the projective plane differs from a sphere. 


7.5.3 Show that a strip of the projective plane surrounding a projective line is a 
Mobius band (Figure 7.18). 


Figure 7.18: A Mobius band 


7.5.4 Why is the Mobius band not a part of the sphere? 
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7.6 The Projective Line 


As we have seen, projective geometry arose from efforts to understand the 
relationship between two and three dimensions. But the idea arising from 
these efforts—that of projection or projective transformations—is interest- 
ing even in one dimension. In this section we make a more detailed study 
of projection from a line to a line, and use it to present a more sophisticated 
concept of projective line. In the process, we meet the concept of linear 
fractional transformation, which plays a key role in many later develop- 
ments. In particular, we will show how linear fractional transformations 
give a new insight into the invariance of the cross-ratio. 

We start by viewing the line as the number line R, and study how the 
numerical values of points are related when we project one line onto an- 
other. The simplest kind of projection is parallel projection (or projection 
from infinity) of a line onto a parallel line, as shown in Figure 7.19. 


0 1 2 3 s 
O+1 141 41 341 Lf 


Figure 7.19: Projection from infinity 


Clearly, when we make the natural choice of coordinates on the two 
lines, parallel projection sends x on £; to x +1 on L5, for some constant /. 
We abbreviate this mapping of coordinates by x x +1. 

If we project from a point P at a finite distance, then it is likewise clear 
from Figure 7.20 (where we align the zero point on each line with P) that 
x on £; is sent to kx on L for some nonzero constant k. We abbreviate 
this mapping of coordinates by x # kx (k # 0). 

A more remarkable case is shown in Figure 7.21, where we project a 
line £; onto a perpendicular line £2 from a point not on either line, but 
equidistant from both. Then, with suitable choice of coordinates, x on L, 
is sent to 1/x on Lo. 
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Figure 7.20: Projection from a finite point 


Lo 


Figure 7.21: Projection of a line onto a perpendicular line 


This makes L, a highly distorted image of £,, with the equally spaced 
points 1,2,3,4,... on £; going to the points 1,1/2,1/3,1/4,... on Lo. 
These image points tend to the point 0 on £2, which is not the projection 
of any point on £;. However, if we extend £, by an extra point co—its 
point at infinity—then it seems right to view 0 on L2 as the projection of 
co on the extended line L£; U {oo}. It likewise seems right to extend £5 by 
its point co at infinity, and to view this point as the projection of 0 on L£). 
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If we still claim that this map sends x to 1/x, then we must admit that 
1/0=co and 1/0 =0. 


We have legalized division by zero! Is this valid? In this limited setting, 
yes. Each line £ through O is marked with two symbols: x and 1/x. If 
£ is neither vertical nor horizontal, then x and 1/x are the intersections 
of £ with L, and L2 respectively; if L is vertical, then x = 0 is its real 
intersection with £, and 1/0 = co is its “intersection at infinity” with its 
parallel £5; if Lis horizontal, then 1/x = 1/oo = 0 is its real intersection 
with £5 and co is its “intersection at infinity” with its parallel £;. 

Actually, division by zero is valid in the more general and interesting 
setting of linear fractional transformations: 


ax+b 
cx+d’ 


f(x) = where ad -—bc #0. 

These are precisely the functions obtainable as combinations of the func- 
tions x H x+1,xh kx fork # 0, and x % 1/x, and they correspond 
to arbitrary projections of one projective line onto another. To be precise, 
each linear fractional function gives a well-defined and one-to-one map of 
R U {oo} to itself, and these maps realize all projections of the projective 
line. See the exercises below. Because of this, we call R U {co}, together 
with its linear fractional functions, the real projective line RP'. 

The linear fractional functions give RP! its “projective” nature. RP! 
has no concept of length, because length is not preserved by linear frac- 
tional functions. Not even the ratio of lengths is preserved, as one can see 
with the function x } 1/x. However, the cross-ratio is preserved by linear 
fractional functions, and hence by projections. 

To see why, consider four points A, B,C, D on a line. If we view these 
points as numbers, then their cross-ratio (defined in Section 7.3) becomes 


CA-DB _ (C-A)(D-B) 
CB-DA  (C-B)(D-A) 


The function x + x +1, which adds / to each of A, B,C, D, obviously 
does not change the cross-ratio. Neither does the function x kx for 
k # 0, which multiplies each of A, B,C, D by k. It is less obvious that the 
cross-ratio is preserved by the function x + 1/x, but a simple calculation 
confirms this. Thus the cross-ratio is preserved by all combinations of x > 
x+1,xt> kx fork # 0, and hence by all linear fractional functions. 
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EXERCISES 


We can see why each linear fractional function is a combination of functions 
of the forms x» x +1, xt kx fork # 0, and x + 1/x by a suitable breakdown 


of the fraction “2, 
cx+d 


7.6.1 Show that 24 = ¢ 4 bead if¢ z 0, 


cxt+d c(ext+d) 


7.6.2 Deduce from Exercise 7.6.1 that the function x > axth is a combination of 


functions x PH x +1,x bh kx, and xb 1/x when c # 0. What if c = 0? 


7.6.3. What property of “*2 is controlled by the condition ad — be # 0? 


cxt+d 
7.6.4 Verify that the cross-ratio eee remains unchanged when each of the 


points A, B, C, D is replaced by its reciprocal. 


It follows that the cross-ratio is preserved by any linear fractional function. It 
remains to show that projections are realized by linear fractional functions. We 
have already done this for projection of a line onto a parallel line. Hence it remains 
to study projection of a line, say the x-axis, onto a line that intersects it, say y = cx. 


7.6.5 Show that projection from the point (a,b) sends the point x = ¢f on the 
x-axis to the point on the line y = cx for which 


- bt 
~ ct+b—ca’ 


which is a linear fractional function of f. 


7.7 Homogeneous Coordinates 


Representing the points of the projective plane RP” by lines through O 
gives coordinates to RP” via the coordinates (x, y, Z) of three-dimensional 
space. Such coordinates were invented by Mdbius (1827) and Pliicker 
(1830), and they are called homogeneous because each algebraic curve 
in RP” is expressed by a homogeneous polynomial equation p(x, y, z) = 0. 
The simplest case is that of a projective line, which, as we saw in Section 
7.5, is represented by a plane through O. Its equation therefore has the 
form 


ax+by+cz=0, for some constants a, b,c, not all zero. 


Such an equation is called homogeneous of degree 1, because each nonzero 
term is of degree 1 in the variables x, y, z. 

The homogeneous coordinates of a point P in RP” are simply the coor- 
dinates of all points on the line through O that represents P. It follows that 
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if (x, y, Z) is one coordinate triple for P, so is (tx, ty, tz) for any real number 
t. And if p(x, y, z) = 0 is the equation of a curve in RP’, the polynomial p 
must be such that 


p(tx, ty,tz) =O for all real numbers f. 


It follows that p(tx, ty, tz) = t” p(x, y, z) for some n, called the degree of p. 
A typical example is the equation 


x* — yz =0, 


which is homogeneous of degree 2. To see what this curve looks like in an 
ordinary plane, such as z = 1, we substitute for the appropriate variable. 
With z = 1 we obtain 

y=X, 
which is the equation of a parabola in the plane z = 1. Thus x? — yz = 0 
is the projective completion of a parabola, with a point at infinity added 
(namely, the y-axis, where x = z = 0). 

But x? — yz = 0 is also the projective completion of a hyperbola. We 
see this by intersecting the projective curve with the plane x = 1, obtaining 
the hyperbola yz = 1. Surprising as this seems at first, it reflects a fact we 
already know from Section 7.4— that all conic sections are projectively the 
same. 

Homogeneous coordinates also make it easy to show that certain cubic 
curves have the same projective completion (see Exercise 7.7.2). 


Bézout’s Theorem Revisited 


As we saw in Section 6.5, to obtain Bézout’s theorem that a curve of de- 
gree m meets a curve of degree n in mn points we need a precise account 
of points at infinity. Homogeneous coordinates simplify this problem by 
changing it to one about homogeneous polynomials. If C,, is a curve with 
homogeneous equation of degree m, 


Pm(x, y, 2) = 0, (1) 


and if C,, is a curve with homogeneous equation of degree n, 


PnlX, y, 2) = 0, (2) 
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one wishes to show that the equation 


Tinn(X, y) = 0, (3) 


which results from eliminating z between (1) and (2), is homogeneous of 
degree mn. This is not hard to do (see exercises), but it seems that a homo- 
geneous formulation of Bézout’s theorem, with a rigorous proof that the 
resultant 7,,, has degree mn, was not given until the late 1800s. According 
to Kline (1972), p. 553, the “proper count of multiplicities” was first made 
by Halphen in 1873. 

An obvious condition must be included in the hypothesis of Bézout’s 
theorem: that the curves C,,, and C, have no common component. The 
algebraic equivalent of this condition is that the polynomials pm, py have 
no nonconstant common factor. Then the form of Bézout’s theorem that 
can be proved with the help of homogeneous coordinates is curves Cy, Cy 
with homogeneous equations P(x, y,Z) = 0, Pax, y, Z) = O of degrees m, 
n and no common component have intersections given by the solutions of 
a homogeneous equation rny(x, y) = 0 of degree mn. 


EXERCISES 


7.7.1 We know that the hyperbola yz = 1| has two points at infinity. To which lines 
through O do they correspond in the projective completion x? — yz = 0? 


7.7.2. By considering the homogeneous polynomial equation x* — y?z = 0, show 
that the cubic curves y = x° and y” = x° have the same projective comple- 
tion. 


As the Chinese discovered (see Section 5.2), the problem of elimination belongs 
to linear algebra. In the case of Bézout’s theorem, this includes the criterion that 
determinant = 0 for a set of homogeneous equations to have a nonzero solution, 
and it leads to an expression for the resultant 7, as a determinant. 


7.7.3. Suppose that 


—1 
Pin(X Ys Z) = az” + az" +++ +am, 


Dal, Y, 2) = bo" + bz"! +--+ + dy 
are homogeneous polynomials of degrees m, n. Thus a;(x, y) is homoge- 
neous of degree i and b;(x, y) is homogeneous of degree j. By multiplying 


Pm and p, by suitable powers of z, show that the equations 


Pm =90 and Pn =0 
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are equivalent to a system of m + n homogeneous linear equations in the 


variables 7”*"-! | |, 


Tinn(X Y) = 


ag 
0 


bo 


0 


a) 
ao 


by 
bo 


.,2°,z!, 2°, which in turn is equivalent to 


- adm O ie 6 
a{ eee an @) eee 0 
0 
@) do eee An = 0. 
b, 0 0 
by bn 
3 on 0 
0 bo naG b 


7.7.4 Show that a polynomial p(x, y) is homogeneous of degree k © p(tx, ty) = 


t p(x, y). 


7.7.5 Show that Mny_(tx, ty) = "Ninn (x, y). Hint: Multiply the rows of Myn(tx, ty) 
by suitable powers of f to arrange that each element in any column contains 
the same power of t. Then remove these factors from the columns so that 


Vnn(X, y) remains. 


® 


Check for 
updates 
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Calculus 


PREVIEW 


The shift towards algebraic thinking was not only a revolution in 
geometry. It was decisive in the second and greatest mathematical revo- 
lution of the 17th century: the invention of calculus. It is true that some 
results we now obtain by calculus were known to the ancients; for exam- 
ple, the area of the parabolic segment was found by Archimedes. But the 
systematic computation of areas, volumes, and tangents became possible 
only when symbolic computation—that is, algebra—became available. 

The dependence of calculus on algebra is particularly clear in the work 
of Newton, whose calculus is essentially the algebra of infinite polynomi- 
als (power series). Moreover, Newton’s starting point was a basic theorem 
about the polynomial (1 + x)”, the binomial theorem, which he extended 
to fractional values of n. 

The calculus of Leibniz was likewise based on algebra—in his case 
the algebra of infinitesimals. Despite doubts about the meaning and exis- 
tence of infinitesimals, Leibniz and his followers obtained correct results 
by computing with them. 

Results that we now obtain through a combination of algebra and limit 
processes were obtained by Leibniz through the algebra of infinitesimals. 
Our derivative dy/dx was, for Leibniz, literally the quotient of the in- 
finitesimal dx by the infinitesimal dy. And our integral f F(x) dx was, for 
Leibniz, literally the sum of the infinitesimals f(x) dx (hence the symbol 
f , which is an elongated S for “sum’’). 


© Springer Nature Switzerland AG 2020 123 
J. Stillwell, Mathematics and Its History, Undergraduate Texts in Mathematics, 
https://doi.org/10.1007/978-3-030-55 193-3_8 


124 8 Calculus 


8.1 What Is Calculus? 


Calculus emerged in the 17th century as a system of shortcuts to results 
obtained by the method of exhaustion and as a method for discovering 
such results. The types of problem suited to calculus were finding lengths, 
areas, and volumes of curved figures and determining local properties such 
as tangents, normals, and curvature—in short, what we now recognize as 
problems of integration and differentiation. Equivalent problems of course 
arise in mechanics, where one of the dimensions is time instead of dis- 
tance; hence calculus also made mathematical physics possible. In addi- 
tion, calculus was intimately connected with the theory of infinite series, 
initiating developments that became fundamental in number theory, com- 
binatorics, and probability theory. 

The extraordinary success of calculus was possible, in the first in- 
stance, because it replaced long and subtle exhaustion arguments by short 
routine calculations. As the name suggests, calculus consists of rules for 
calculating results, not their logical justification. Mathematicians of the 
17th century were familiar with the method of exhaustion and assumed 
they could always fall back on it if their results were challenged, but the 
flood of new results became so great that there was seldom time to do so. 
As Huygens (1659), p. 337, wrote, 


Mathematicians will never have enough time to read all the 
discoveries in Geometry (a quantity which is increasing from 
day to day and seems likely in this scientific age to develop 
to enormous proportions) if they continue to be presented in a 
rigorous form according to the manner of the ancients. 


The progress in geometry when Huygens wrote was indeed impres- 
sive, considering the very simple system of calculus then available. Virtu- 
ally all that was known was the differentiation and integration of powers 
of x (possibly fractional) and implicit differentiation of polynomials in x 
and y. However, when allied with algebra and analytic geometry, this was 
sufficient to find tangents, maxima, and minima for all algebraic curves. 
And when allied with Newton’s calculus of infinite series, discovered in 
the 1660s, the rules for powers of x formed a complete system for differ- 
entiation and integration of all functions expressible in power series. 

The subsequent development of calculus is a puzzling exception to 
the normal process of simplification in mathematics. Nowadays we have a 
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much less elegant system, which downplays the use of infinite series and 
complicates the system of rules for differentiation and integration. The 
rules for differentiation are still complete, given a sensible set of opera- 
tions for constructing functions, but the rules for integration are patheti- 
cally incomplete. They do not suffice to integrate simple algebraic func- 
tions like V1 + x3, or even rational functions with undetermined constants 
like 1/(x° — x — A). Moreover, it is only in recent decades that we have 
been able to tell which algebraic functions are integrable by our rules. 
(This little-known result is expounded by Davenport (1981).) 

The conclusion seems to be that, apart from streamlining the language 
slightly, we cannot make calculus any simpler than it was in the 17th cen- 
tury! It is certainly easier to present the history of the subject if we refrain 
from imposing modern ideas. This approach also has the advantage of em- 
phasizing the computational nature of calculus—it is about calculation, 
after all. 

Much has been written on the history of calculus, and some useful 
books are Boyer (1959), Baron (1969), Edwards (1979), and Bressoud 
(2019). The earlier historians are inclined to harp on the question of logical 
justification and to spend a disproportionate amount of time on the way 
it was handled in the 19th century. This tends to obscure the boldness 
and vigor of early calculus, and can be dogmatic about the way in which 
calculus should be justified. Apart from the justification already available 
in the 17th century (the method of exhaustion), there is also a 20th-century 
justification (the new theory of infinitesimals of Robinson (1966)). The 
sheer diversity of foundations for calculus suggests that we have not yet 
got to the bottom of it. 


8.2 Early Results on Areas and Volumes 


The idea of integration is often introduced by approximating the area under 
curves y = x* by rectangles (Figure 8.1), say, from 0 to 1. If the base of the 
region is divided into n equal parts, then the heights of the rectangles are 
(1/n)‘, (2/n)*,...,(n/n)*, and the area occupied by the rectangles depends 
on the series 1‘ + 2 + --- +n“. If the curve is revolved around the x-axis, 
then the rectangles sweep out cylinders of cross-sectional area r*, where 
r = (1/n)‘,(2/n)‘,...,(n/n)*, whose sum depends on [PF eo oes, 

After the time of Archimedes, the first new results on areas and vol- 
umes were in fact based on summing these series. The Arab mathematician 
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— 
Il 
sad 


Figure 8.1: Approximating an area by rectangles 


al-Haytham (around 965-1039) summed the series 164244... 4n* for 
k = 1,2,3,4, and used the result to find the volume of the solid obtained by 
rotating the parabola about its base. See Baron (1969), p. 70, or Edwards 
(1979), p. 84, for al-Haytham’s method. 

Cavalieri (1635) extended these results up to k = 9, using them to 


obtain the equivalent of 
k+1 
ie xk dx = 7 
0 k+1 


and conjecturing this formula for all positive integers k. This result was 
proved in the 1630s by Fermat, Descartes, and Roberval. Fermat even 
obtained the result for fractional k (see Baron (1969), pp. 129, 185, and 
Edwards (1979), p. 116). Cavalieri is best known for his method of indivis- 
ibles, an early method of discovery that divided areas into infinitely thin 
strips and volumes into infinitely thin slices. Archimedes’ Method used 
similar ideas but, as mentioned in Section 4.1, this was not known until 
the 20th century. Remarkably, Cavalieri’s contemporary Torricelli (the in- 
ventor of the barometer) speculated that such a method may have been 
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used by the Greeks. One of Torricelli’s own discoveries, which caused as- 
tonishment at the time, was that the infinite solid obtained by revolving 
y = 1/x about the x axis from 1 to oo has finite volume (Torricelli (1643) 
and Exercise 8.2.3). The philosopher Hobbes (1672) wrote of Torricelli’s 
result that “to understand this for sense, it is not required that a man should 
be a geometrician or logician, but that he should be mad.” 


EXERCISES 


8.2.1 Find 1 +2+---+n by summing the identity (m+ 1)? — m? = 2m + 1 from 
m = | ton. Similarly find 17 + 27 +--- +n? using the identity 


(m+ 1) —n? = 3m? +3m+1 


together with the previous result. Likewise, find 13 +23 +---+n° using the 
identity 

(m+ 1)* — m* = 4m} + 6m? + 4m +1 
and so on. 


8.2.2 Show that the approximation to the area under y = x” by rectangles in 
Figure 8.1 has value (2n + 1)n(n + 1)/ 6n>, and deduce that the area under 
the curve is 1/3. 


8.2.3 Show that the volume of the solid obtained by rotating the portion of y = 
1/x from x = 1 to co about the x-axis is finite. Show, on the other hand, 
that its surface area is infinite. 


Cavalieri’s most elegant application of his method of indivisibles was to prove 
Archimedes’ formula for the volume of a sphere. His argument is simpler than that 
of Archimedes, and it goes as follows. 


8.2.4 Show that the slice z = c of the sphere x* + y? + z? = 1 has the same area 
as the slice z = c of the cylinder x? + y? = 1 outside the cone x* + y* = 2? 
(Figure 8.2). 


Figure 8.2: Slices considered by Cavalieri 


8.2.5 Deduce from Exercise 8.2.4, and the known volume of the cone, that the 
volume of the sphere is 2/3 the volume of the circumscribing cylinder. 
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8.3 Maxima, Minima, and Tangents 


The idea of differentiation is now considered to be simpler than integra- 
tion, but historically it developed later. Apart from the construction of the 
tangent to the spiral r = a@ by Archimedes, no examples of the character- 
istic limiting process 


. f(x + Ax) — f() 
im 
Ax-0 Ax 


appeared until it was introduced by Fermat in 1629 for polynomials f and 
used to find maxima, minima, and tangents. Fermat’s work, like his dis- 
covery of analytic geometry, was not published until 1679, but it became 
known to other mathematicians through correspondence after a more com- 
plicated tangent method was published by Descartes (1637). 

Fermat’s calculations involve a sleight of hand also used by Newton 
and others: introduction of a “small” or “infinitesimal” element E at the 
beginning, dividing by £ to simplify, then omitting E at the end as if it 
were zero. For example, to find the slope of the tangent to y = x* at any 
value x, consider the chord between the points (x, x*) and (x + E, (x+ E)*) 
on it: 


(x+ EY - x? 
slope = ——————- 
E 
2xE + E? 
= = Oe eB. 
E 


We now get the slope of the tangent by neglecting E. By seeming to claim 
that 2x + E = 2x and at the same time E # 0, this procedure enraged 
philosophers such as Hobbes. We know it is only necessary to claim that 
limg_,9(2x + EF) = 2x, but 17th-century mathematicians did not know how 
to say this. In any case, they were too carried away with the power of the 
method to worry about such criticisms (and it was hard to take philoso- 
phers seriously when they were as obstinate as Hobbes; see previous sec- 
tion). Fermat’s method applies to all polynomials p(x), since the highest- 
degree term in p(x + E) is always canceled by the highest-degree term in 
p(x), leaving terms divisible by E. Fermat also extended it to curves given 
by polynomial equations p(x, y) = 0. He did this in 1638 when Descartes, 
hoping to stump him, proposed finding the tangent to the folium. 
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The generality of Fermat’s method entitles him to be regarded as one 
of the founders of calculus. He could certainly find tangents to all curves 
given by polynomial equations y = p(x) and probably to all algebraic 
curves p(x,y) = 0. A completely explicit rule for the latter problem was 
found by Sluse about 1655 (but not published until Sluse (1673)) and by 
Hudde in 1657 (published in the 1659 edition of Descartes’s La Géomeétrie, 
Schooten (1659)). In our notation, if 


p(x, y) = De a;;x'y! = 0, 


then ee 
dy _ D iajjx' ly! 


dx > jaiyxty 


Nowadays, this result is easily obtained by implicit differentiation (see the 
exercises below), but it can also be obtained by direct manipulation of 
polynomials. 


EXERCISES 


For evidence that tangents to algebraic curves may be found without calculus, 
it is enough to look more closely at what we called Diophantus’s tangent method 
in Section 3.5. In his Arithmetica, Problem 18, Book VI (previously mentioned in 
Exercise 3.5.1), Diophantus finds the tangent y = x +ltoy? = -3x?+3x+1 
at the point (0,1), apparently by inspection. Without mentioning its geometric 
interpretation, he simply substitutes 3x +1 foryiny? =x —3x°+3x4+1. 


8.3.1 Check that this substitution gives the equation 
2 
3 2 
cis = 0. 
xX x 


What is the geometric interpretation of the double root x = 0? 


8.3.2 What would you substitute for y to find the tangent at (0, 1) to the curve 
y? = - 3x7 +5x4+12 


These examples show how tangents can be found by looking for double roots, 
though it requires some foresight to make the right substitution. With calculus, 
the process is more mechanical. 


8.3.3 Derive the formula of Hudde and Sluse by differentiating ) a;jx'y/ = 0 
with respect to x. 


8.3.4 Use differentiation to find the tangent to the folium x° + y*? = 3axy at the 
point (0, c). 
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8.4 The Arithmetica Infinitorum of Wallis 


Wallis’s efforts to arithmetize geometry were noted in Section 6.6. In his 
Arithmetica Infinitorum, Wallis (1655a) made a similar attempt to arithme- 
tize the theory of areas and volumes of curved figures. Some of his results 
were, understandably, equivalent to results already known. For example, 


he gave a proof that 
1 
1 
Cae 
0 pt 1 


for positive integers p by showing that 


OP +12 +2? +--+. +n? 1 


SSS 8B aS nwa ow, 
nP+nP+nP+---+nP ptil 


However, he made a new approach to fractional powers, finding f a ax 
directly rather than by consideration of the curve y” = x”, as Fermat 
had done. He first found i x dx, f x!/3 dx, ... by considering the ar- 
eas complementary to those under y = x*, y = x°,... (Figure 8.3), then 
guessed the results for other fractional powers by analogy with those 
already obtained. 


> 


f{2dx=} = fi 2? ax = 2 
Figure 8.3: Areas used by Wallis 


Like other early contributors to calculus, Wallis was ambivalent about 
quantities that tended to zero, treating them as nonzero one minute and 
zero the next. For this he received a ferocious blast from his arch-enemy 
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Thomas Hobbes: “Your scurvy book of Arithmetica infinitorum; where 
your indivisibles have nothing to do, but as they are supposed to have 
quantity, that is to say, to be divisibles” (Hobbes (1656), p. 301). Leaving 
aside this fault, which is easily remedied by limit arguments, the reasoning 
of Wallis is extremely incomplete by today’s standards. Observing a pat- 
tern in formulas for p = 1,2,3, for example, he will immediately claim a 
formula for all positive integers p “by induction” and for fractional p “by 
interpolation.” His boldness reached new heights toward the end of the 
Arithmetica infinitorum in deriving his famous infinite product formula, 


xm 24 4 6 6 
An exposition of his reasoning may be found in Edwards (1979), pp. 171- 
176, where it is described as “one of the more audacious investigations by 
analogy and intuition that has ever yielded a correct result.” 
However, we must bear in mind that Wallis was offering primarily a 
method of discovery, and what a discovery he made! His infinite product 


for a was not the first ever given, since Viéte (1593) had discovered 
2 1 ™ ™ 


7 7 O08 7 COS 5 C08 FE 

7 Vi fe le We NMG If 

No Vel Va) el Val Nal 
However, the formula of Viéte is based on a clever but simple trick (see 
exercises), whereas that of Wallis is of deeper significance. By relating z to 
the integers through a sequence of rational operations, Wallis uncovered 
a sequence of fractions, obtained by terminating the product at the nth 
factor, that he called hypergeometric. Similar sequences were later found 
to occur as coefficients in series expansions of many functions, which led 
to a broad class of functions being called hypergeometric by Gauss. Also, 
Wallis’s product was closely related to two other beautiful formulas for 
based on sequences of rational operations: 


4 ice 
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and 
1 1 1 1 


4°. ee 
The continued fraction was obtained by Brouncker from Wallis’s product 
and also published in Wallis (1655b). The series is a special case of the 
series 


x. 8 x 


-1 
tan SO ae og 
discovered by the Indian mathematician Madhava in the 15th century (see 
Section 9.2) and rediscovered by Newton, Gregory, and Leibniz. Euler 
(1748a), p.311, gave a direct transformation of the series for 2/4 into 
Brouncker’s continued fraction. Besides setting off this spectacular chain 
reaction, Wallis’s method of interpolation had important consequences in 
the work of Newton, who used it to discover the binomial theorem for frac- 
tional powers p (Section 9.3), where (1 + x)? becomes an infinite series. 


EXERCISES 


8.4.1 Use the identity sin x = 2 sin(x/2) cos(x/2) to show that 


sin x x x x 
- = COS = COS | ++-cos—, 
2” sin(x/2") 2 22 Qn 
whence : 
sin x x x x 
= cos — cos —::: 
2 22 23 


8.4.2 Deduce Viéte’s product by substituting x = 2/2. 


The equation relating the series for 2/4 to the continued fraction for 4/z, 
namely 


1 1 i‘ 1 1 eaters 1 
aS. GF 7 i 
1+ 
32 
2+ 
52 
2+ 
7p 
2+ 
2+ 
follows immediately from a more general equation 
1 1 n 1 1 es 1 
A BC D 7 A’ 
A+ 
B 
B-A+t ; 
C-B . 
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proved by Euler (1748a), p. 311. The following exercises give a proof of Euler’s 
result. 


8.4.3 Check that 


Lol] 1 
A B A 
A+ —— 
B-A 
8.4.4 When i on the left side in Exercise 8.4.3 is replaced by 7 - a which equals 


1 
B2 
Bay 


Bt &. Hence show that 


by Exercise 8.4.3, show that B on the right should be replaced by 


1 1 é 1 1 
A BC A? 
At 
B2 
B-A+ 
C-B 
Thus when we modify the tail end of the series (replacing ; by z - +), only 


the tail end of the continued fraction is affected. This situation continues: 


8.4.5 Generalize your argument in Exercise 8.4.4 to obtain a continued fraction 
for a series with n terms, and hence prove Euler’s equation. 


8.5 Newton’s Calculus of Series 


Newton made many of his most important discoveries in 1665/6, after 
studying the works of Descartes, Viéte, and Wallis. In Schooten’s edition 
of La Géométrie he encountered Hudde’s rule for tangents to algebraic 
curves, which was virtually a complete differential calculus from New- 
ton’s viewpoint. Although Newton made contributions to differentiation 
that are useful to us—the chain rule, for example—differentiation was a 
minor part of his calculus, which depended mainly on the manipulation of 
infinite series. Thus it is misleading to describe Newton as a founder of 
calculus unless one understands calculus, as he did, as an algebra of infi- 
nite series. In this calculus, differentiation and integration are carried out 
term by term on powers of x and hence are comparatively trivial. 

At the beginning of his main work on calculus, A Treatise of the Meth- 
ods of Series and Fluxions (also known by its abbreviated Latin name of 
De methodis), Newton likens the role of infinite series to the role of infinite 
decimals: 
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Since the operations of computing in numbers and with vari- 
ables are closely similar ...[ am amazed that it has occurred 
to no one (if you except N. Mercator with his quadrature of 
the hyperbola) to fit the doctrine recently established for dec- 
imal numbers in similar fashion to variables, especially since 
the way is then open to more striking consequences. For since 
this doctrine in species has the same relationship to Algebra 
that the doctrine in decimal numbers has to common Arith- 
metic, its operations of Addition, Subtraction, Multiplication, 
Division and Root extraction may be easily learnt from the 
latter’s. 

Newton (1671), pp. 33-35 

The quadrature (area determination) of the hyperbola mentioned by 
Newton was the result that we would write as 


* dt x x» x4 


first published in Mercator (1668). Newton had discovered the same result 
in 1665, and it was partly his dismay in losing priority that led him to 
write De methodis and an earlier work De analysi (Newton (1669); the full 
title in English is On Analysis by Equations Unlimited in Their Number of 
Terms). Newton also independently discovered the series for tan™! x, sin x, 
and cos x in De analysi, without knowing that all three series had already 
been discovered by Indian mathematicians. See Section 9.2. 

Newton rediscovered the Mercator and Indian results by the method of 
expanding a geometric series and integrating term by term. In our notation, 


X dt Xe 
— =] (l-t+P—-P 4+---)adt 
0 l+t 0 


ll 
4 
| 
| 
+ 
| 
| 
| 
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and 
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He routinely used these methods in De analysi and De methodis, but greatly 
extended their scope by algebraic manipulation. Not only did he find sums, 
products, quotients, and roots, as foreshadowed in his introduction to De 
methodis, but his root extractions also extended to the general construction 
of inverse functions by the new idea of inverting infinite series. For exam- 
ple, after Newton (1671), p. 61, found the series x — 12406 ByH 2, 
for . dt/(1 + t), which is log(1 + x), he set 


ee 


ee aaa aa (1) 


and solved (1) for x (which we recognize to be the exponential function 
e’, minus 1). His method amounts to setting x = ag + ayy + doy? ++, 
substituting in the right-hand side of (1), and determining do, a), a2,... 
in succession by comparing with the coefficients on the left-hand side. 
Newton found the first few terms, 


= a ee | 5 
a ae ee age age 
then confidently guessed that a, = 1/n! in the manner of Wallis. As he put 
it, “Now after the roots have been extracted to a suitable period, they may 
sometimes be extended at pleasure by observing the analogy of the series.” 


De Moivre (1698) gave a formula for inverting series that justifies such 
conclusions; Newton astonishes us by finding such an elegant result by 
such a forbidding method. His discovery of the sine series (Newton (1669), 
pp. 233, 237) is even more amazing. First he used the binomial series 

-1 —1)\(p-2 
P(P-1) 2, PP-VP-2) 3. 


P 
(l+a)?=1+pa+ 7 a+ 31 


(though not with the natural choice a = = “ps —$) to obtain 
be dese 153.58 

+ + +--+: 
23 2-45 2-4-67 


by term-by-term integration, and then casually stated “I extract the root, 
which will be 


t 


sin x=Z=x+ 


= 1 34 1 5 1 a 1 9 
*=<~ 6% * 190° ~ 5040° ~ 362880° 


” 


adding a few lines later that the coefficient of z2”*! is 1/(2n + 1)!. 
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EXERCISES 


Newton inverted series by a tabular method like the following, which shows 
the coefficients of 1, y,y?, y°,... in x and its powers. 
ko y y 
xX ao a\ a2 a3 


x ay 2aoa, 2apa2 + a; 2aga3 + 2a\ar 


8.5.1 Use the rows shown to substitute series in powers of y for x and x? in 
y=x- 7 +--+, and hence show that aj = 0, a, = 1, and az = 1/2 in turn, 
by comparing coefficients on the two sides of the equation. 


8.5.2 Compute the first few entries in the third row of the table (the coefficients 
of x°), and hence show that a3 = 1/6. 


This shows why the inverse function x = e¥ — 1 has a power series that begins 
I 4 
—Yy tere. 
6! 
8.5.3 Show that the binomial series gives 
1 Sag 1, 1-3, 1:3: 56 
Viae 2 2:4 2:46 a 


8.5.4 Use Exercise 8.5.3 and sin! x = tle dt/ V1 — #2 to derive Newton’s series 


for sin7! x. 


spe 
y a4 


64. oe 


8.6 The Calculus of Leibniz 


Newton’s epoch-making works (1669, 1671) were circulated among some 
of his contemporaries but, incredible as it now seems, were not published 
at the time. The reasons seem complicated—see Westfall (1980), p. 231— 
but at any rate, the first published paper on calculus was not by Newton but 
by Leibniz (1684). This led to Leibniz’s initially receiving credit for the 
calculus and later to a bitter dispute with Newton and his followers over 
the question of priority for the discovery. 

There is no doubt that Leibniz discovered calculus independently, that 
he had a better notation, and that his followers contributed more to the 
spread of calculus than did Newton’s. Leibniz’s work lacked the depth 
and virtuosity of Newton’s, but then Leibniz was a librarian, a philoso- 
pher, and a diplomat with only a part-time interest in mathematics. His 
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Nova methodus (1684) is a relatively slight paper, though it does lay down 
some important fundamentals—the sum, product, and quotient rules for 
differentiation—and it introduces the dy/dx notation we now use. How- 
ever, dy/dx was not just a symbol for Leibniz, as it is for us, but literally a 
quotient of infinitesimals dy and dx, which he viewed as differences (hence 
the symbol d) between neighboring values of y and x, respectively. 

He also introduced the integral sign, f , in his De geometria (1686) and 
proved the fundamental theorem of calculus, that integration is the inverse 
of differentiation. This result was known to Newton and even, in a geo- 
metric form, to Newton’s teacher Barrow, but it became more transparent 
in Leibniz’s formalism. For Leibniz, f meant “sum,” and f F(x) dx was 
literally a sum of terms f(x)dx, representing infinitesimal areas of height 
F(x) and width dx. The difference operator d yields the last term f(x) dx 
in the sum, and dividing by the infinitesimal dx yields f(x). So voila! 


d 
< i Fla)dx = fla) 
XxX 


—the fundamental theorem of calculus. 

The Leibniz fundamental theorem can be viewed as infinitesimal ge- 
ometry by interpreting f f(x) dx as the area A(x) under the curve y = f(t) 
between t = a tot = x (Figure 8.4). Then an infinitesimal increase in t¢ 
from x to x + dx increases A(x) by an infinitesimal amount dA(x), the area 
of an infinitesimal rectangle of width dx and height f(x). 


(a 


A(x) f(x) dA(x) 


> t 


O a x xtdx 


Figure 8.4: Fundamental theorem of calculus as infinitesimal geometry 
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Thus A(x) is an antiderivative! of f(x): 


dA(x) = f(x) dx, and therefore “_ = f(x). 

Leibniz’s strength lay in the identification of important concepts, rather 
than in their technical development. He introduced the word “function” 
and was the first to begin thinking in function terms. He made the dis- 
tinction between algebraic and transcendental functions and, in contrast to 
Newton, preferred “closed-form” expressions to infinite series. Thus the 
evaluation of f f(x) dx for Leibniz was the problem of finding a known 
function whose derivative was f(x), whereas for Newton it was the prob- 
lem of expanding f(x) in series, after which integration was trivial. 

The search for closed forms was a wild goose chase but, like many 
efforts to solve intractable problems, it led to worthwhile results in other 
directions. Attempts to integrate rational functions raised the problem of 
factorization of polynomials and led ultimately to the fundamental theo- 
rem of algebra (see Chapter 11). Attempts to integrate 1/ V1 — x* led to 
the theory of elliptic functions (Chapter 10). 

As mentioned in Section 8.1, the problem of deciding which algebraic 
functions may be integrated in closed form has been solved only recently, 
though not in a way suitable for calculus textbooks, which have basically 
not advanced much further than Leibniz. (One thing that has changed: it is 
now much easier to publish a calculus book than it was for Newton!) 


EXERCISES 


Leibniz (1702) was stymied by the integral f ee, because he did not spot 
the factorization of x* + 1 into real quadratic factors. 


8.6.1 Writing x4 + 1 = x* + 2x7 + 1 — 2x? or otherwise, split x* + 1 into real 
quadratic factors. 


8.6.2 Use the factors in Exercise 8.6.1 to express oan in the partial fraction form 


x+ v2 war v2 
q(x) qu(x) * 


where qi(x) and g2(x) are real quadratic polynomials. 


8.6.3. Without working out all the details, explain how the partial fractions in 
Exercise 8.6.2 can be integrated in terms of rational functions and the tan! 
function. 


'The fundamental theorem says that in calculus you only have to know differentiation— 
but you have to know it backwards. 


® 
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Infinite Series 


PREVIEW 


AS we saw in the previous chapter, many calculus problems have a solution 
that can be expressed as an infinite series. It is therefore useful to be able 
to recognize important individual series and to understand their general 
properties and capabilities. This is the aim of the present chapter. 

Starting with the infinite geometric series, already known to Euclid, we 
discuss the handful of examples known before the invention of calculus. 
These include the harmonic series 1 + 1/2 +1/3+1/4+---, studied by 
Oresme around 1350, and the stunning series for the inverse tangent, sine, 
and cosine, discovered by Indian mathematicians in the 15th century. 

The invention of calculus in the 17th century released a flood of new 
series, mostly of the form ap + a,x + yx? +++» (called power series), but 
also some variations, such as generalizations of the harmonic series. 

Euler (1748a) introduced the generalization 


14+1/2°+1/3°+1/4°+---, 


whose value for s = 2, he had already shown to be x”/6. He also showed 
that, for s > 1, the series equals the infinite product 


(= 1/2° 1-1/3) 10 = 1/S*y l= pyle 


over all the prime numbers p. This discovery of Euler’s opened a new path 
to the secrets of the primes, exploration of which continues to this day. 

The book Euler (1748a), whose full title is Introduction to the Analysis 
of the Infinite, was intended by Euler to be preparation for calculus. Infinite 
sums and products were the “pre-calculus” of the 18th century! 
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9.1 Early Results 


Infinite series were present in Greek mathematics, though the Greeks tried 
to deal with them as finitely as possibly by working with arbitrary finite 
sums ad, + d) +--+ +d, instead of infinite sums a, + a2 + ---. However, 
this is just the difference between potential and actual infinity. There is no 
question that Zeno’s paradox of the dichotomy (Section 4.1), for example, 
concerns the decomposition of the number | into the infinite series 


and that Archimedes found the area of the parabolic segment (Section 4.4) 
essentially by summing the infinite series 


11.1, _4 


Both these examples are special cases of the result we express as summa- 
tion of a geometric series 


a 
34...=——— when |r| <1. 


a+ar+ar +ar 
l-r 


The first examples of infinite series other than geometric series 
appeared in the Middle Ages. In a book from around 1350, called the Liber 
calculationum, Richard Suiseth (or Swineshead, known as the Calculator) 
used a very lengthy verbal argument to show that 

; + a + : + s +-+-=2 

ao ae = 
The argument is reproduced in Boyer (1959), p.78. At about the same 
time, Oresme (1350b), pp.413-421, summed this and similar series by 
geometric decomposition as in Figure 9.1, showing that 


1 2 3 4 


2=-+5+5+5t 
2 2 oe 


Actually Oresme gives only the last picture in the figure, but it seems 
likely he arrived at it by cutting up an area of two square units as shown, 
judging from his opening remark: “A finite surface can be made as long 
as we wish, or as high, by varying the extension without increasing the 
size.” The region constructed by Oresme, incidentally, is perhaps the first 
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1 
8 
1 1 
4 8 
1 1 jl 
l 2 a |8 
_ —_ 1 1 jl 
1 = 1 = ae a 


Figure 9.1: Oresme’s summation 


example of the phenomenon encountered by Torricelli (Section 8.2) in his 
hyperbolic solid of revolution—infinite extent but finite content. 

Another important discovery of Oresme (1350a) was the divergence of 
the harmonic series 


pee ee 
2345 #© 


His proof was by an elementary argument that is now standard: 


1+(s)+(sea}+(Ee cereale 
a3 4) \5 6° 7 8 
11 
3*8 


Sieleleeeleled ease ele 
2 4 4 8 8 
1 1 

=1l+7-+7=-+=+ 
2 2 


Thus by repeatedly doubling the number of terms collected in succes- 
sive groups, we can indefinitely obtain groups of sum > s, enabling the 
sum to grow beyond all bounds. 


EXERCISES 


Oresme’s proof by partitioning the harmonic series into 


ees ee aes Oe pees 
2 3. 4 5 6 7 8 


has the following geometric counterpart. 
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y 
> Xx 
O 1 2 3 4 n n+l 
Figure 9.2: Comparing | + 5 + ; tee t 1 with an area 
9.1.1 By referring to Figure 9.2, show that 
1 1 1 1 
ie ea > area under y = ~ between x = 1 andx=n+1. 
Fi 5 


9.1.2 Now partition this area under y = 1/x into the pieces between x = 1 and 
x=2,x=2andx=4,x=4andx = 8,..., and show that all these pieces 
have the same area. (This can even be done without using calculus, if you 
use the argument of Exercises 4.4.1 and 4.4.2.) 


9.1.3 Deduce from Exercise 9.1.2 that the area from x = | to x = n, and hence 
the sum | + 5 + i feet i tends to infinity. 


The area under y = 1/x from x = | to x =n + 1 is of course log(n + 1), so Figure 
9.2 shows that 1 + 5 + ; tere t 1 > log(n + 1). As n > 9, these two functions 
of n remain about the same size. 


9.1.4 By comparing the curved area with suitable rectangles beneath the curve, 
show that 


1 1 1 
a tate ts <logiat b, 


and hence that0 <1+3+4+-:-+4-log(in+1)<1. 


9.1.5 Also show, by a geometric argument, that 1 + 5 + ; tet 1 — log(n + 1) 
increases as n increases, so that it has a finite limit < 1. 


The value of the limit is known as Euler’s constant y, and y is approximately 
0.577. However, little is known about the nature of y—not even whether it is 


irrational. 
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9.2 From Pythagoras to Pi 


As mentioned in Section 8.4, the Indian mathematician Madhava found 


the series 
3 5 7 


fi gee ee 
ae a oe, 
with its important special case 
1 1 1 1 
=1l=-2>+2>-=+ 
4 3 5 7 


in the 15th century. The series for 2 was the first satisfactory answer to 
the classical problem of squaring the circle, for although the expression is 
infinite (as it must be, by Lindemann’s theorem on the transcendence of 
m), the rule producing successive terms is as finite and transparent as could 
be. Sadly, the Indian series became known in the West too late to have 
any influence or even to become well known until recently. Rajagopal and 
Rangachari (1977, 1986) showed that the series for tan7! x, sin x, and cos x 
were known in the Kerala school of Madhava before 1540, and probably 
before 1500. For more recent information on the Kerala school, in the con- 
text of trigonometry and of Indian mathematics in general, see Van Brum- 
melen (2009) and Plofker (2009) respectively. 


In this section we give a streamlined derivation of the Madhava series 
for x, bypassing the trigonometry and using only a little calculus. Our 
starting point is the pair of equations found in Section 1.3: 


There, we used these equations only for rational values of t, in order to find 
all rational points (x, y) on the unit circle and hence all Pythagorean triples. 
Here, we use them for all real values of t to describe the whole circle, 
except for the point (—1, 0), by two rational functions of t. The beauty 
of this description is that it is amenable to basic calculus—in particular, 
differentiation of rational functions. 


For any curve given parametrically by x = f(4), y = g(®) the distance 
As between points with parameter values ¢ and t + At is 


2 2 
As = ai + Ay At, 
At At 
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where Ax = f(t + At) — f(0), Ay = g(t + At) — g(t). This is because of the 
Pythagorean theorem, which says that the distance As between the points 
(x, y) and (x + Ax, y + Ay) is given (see Figure 9.3) by 


(As)? = (Ax) + (Ay). 


s 
DS Ay 


Ax 


Figure 9.3: Approaching arc length via the Pythagorean theorem 


It follows, by letting At — 0, that the arc length of the circle between 
parameter values ¢ = a and t = bis the integral 


b d 2 d 2 
Gee 


: bate : 2 : 
Now, differentiating the equations x = nn y= = gives 


dx 4t 2y d dy 2-2f 2x 
=- == an = = : 
dt (1+?) 1+7 dt (+f) 14+f 


When these expressions are substituted in the arc length integral (*) we 
get, thanks to the fact that a y? =1, 


i 2 dt 
7 Li 
It is also clear, since ¢ is the slope of the line through (—1, 0) in Section 


1.3, that we get one quarter of the circle as t runs from 0 to 1. So, defining 
m to be the length of the semicircle, we have 


1 ih 2 dt epee. oO [ dt 
-= —, or,equivalently, — = —.. 
2~ J, 1+P a » 4° J T4+2 


The latter is the integral usually found by trigonometric considerations, 
such as tan”! 1 = 27/4. We now conclude in the usual way, expanding oe 
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as the geometric series 1 — 17 + 4 — 1° + 


1 
-= {<7 f— 7 t 
4 { 4- fie " ee 


3°°5 «7 P 
= i id Es, 
> a ° 5 7 


EXERCISES 


The proof above skates over one delicate point: the infinite geometric series 
expansion 
1 


—_=1-fP+t-+.-.. 
1+7 


This expansion is valid only for |t| < 1, whereas we allowed t¢ = | in the integra- 
tion. This problem can be fixed by considering finite geometric series, which can 


be integrated without fear. 


9.2.1 Show that l+a+a*+---+a"= at for a # | and hence that 


1 n+1 
=l+at+¢ 4+ 4a"4+ fora # 1. 
l-a 1- 
9.2.2. Conclude from Exercise 9.2.1 that 
1 4 5 ; pent2 
=l-f4+f—---4+(-bI'r" +(-1)" for all t. 
1+2 mee ae, 1+ ia 


pnt 


9.2.3 Replacing > in the integral by 1 — + f4 —--- + (—1)"1" + (-1)""!5 


1+??? 


show that 
1 1 1 pent? 
Le | ee er oy =a [ Soa 
4 3 5 2n+1 9 1+2 


penr2 
1472 


9.2.4 Explain why if dt < iL Pda and hence that 


aa 
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9.3 Power Series 


The Indian series for tan! x was the first example, apart from geometric 
series suchas 1+x4+2x7+x° 
the expansion of a function f(x) in powers of x. The idea of power series 


+---= 1/1 — x), of a power series, that is, 


turned out to be fruitful not only in the representation of functions but even 

in the study of numerical series. Most of the interesting numerical series 

turned out to be instances of power series for particular values of x, for 

example, the series for 7/4 is the x = | instance of the series for tan7! x. 
The theory began with the series published by Mercator (1668): 


2 3 4 
xr x 
log +x)=x-—+—-—+ 
oa ys 4 
As we have seen, this was obtained by integrating the geometric series 
1 
Saxe 2 gis 
1+x 


term by term. Now the most important transcendental functions—logs, 
exponentials, and the related circular and hyperbolic functions—are 
obtained by integration and inversion from algebraic functions, and fairly 
simple algebraic functions at that. For example, e” is the inverse function 


of y = log x, and 
meager | 
) = —., 
g " 0 l+t 


sin y is the inverse function of y = sin™' x and 


| { dt t -] { dt 
ui x= =, a a —~, 
0 vVl-? g¢ ter 


and so on. Thus the key to finding power series is finding series expansions 
of simple algebraic functions. Once this is done, term-by-term integration 
and Newton’s method of series inversion (Section 8.5) yield power series 
for most of the common functions. 

Rational functions, such as 1/(1 +27), can be expanded using geometric 
series. Newton (1665a) made a crucial advance when he discovered the 
general binomial theorem, 


PP) io, PP-VDP-% 3, 


Pp 
(l+x)?=1+px+ TT 31 , 
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yielding the expansion of functions such as 1/ V1 — #2 = (1 — t’)~'/?. This 
theorem was also discovered independently by Gregory (1670). Both New- 
ton and Gregory were inspired by the loose heuristic method of interpola- 
tion used by Wallis (1655a), but they refined it into a result now known as 
the Gregory—Newton interpolation formula: 


(h/b)(h/b - 1 


h 
fla+h) = fla)+ Zaftay+ OCP Ow fae, 


where 


Af(@ = flat b)- f@, 
A? f(a) = Af(a +b) - Af(a) = f(a + 2b) — 2f(a +b) + f(a), 
A} f(a) = A? f(a +b) — A’ f(a) = fla + 3b) — 3 f(a + 2b) + 3f (at b) - f(a), 


This wonderful formula finds the value of f at an arbitrary point a+h from 
the values at an infinite arithmetic sequence of points a,a + b,a + 2b,.... 


The first n terms give an nth-degree polynomial in h taking the same 
values as f ata,a+b,...,a+nb. Hence the formula is valid for any f that is 
the limit of its own approximating polynomials. This means all functions 
representable by power series, provided that the points a,a+b,a+2b,..., 
are sensibly chosen. (The points z, 27, 37,..., are a bad choice for sin x, 
since the x-axis is a polynomial curve through all of them). 


Newton discovered the formula (1) after his special investigations on 
interpolation that led to the binomial theorem. Independently of Newton, 
Gregory discovered the general formula first and derived the binomial the- 
orem from it (see exercises below), It even appears that Gregory used the 
interpolation theorem to discover Taylor’s theorem 44 years before Brook 
Taylor. The Taylor series 


h2 
fath=f@+hf@t fart (2) 


is just the limiting case of (1) as b — O. Indeed, this is how it was derived 
by Taylor (1715). The passage from (1) and (2) is simple if one assumes 
plausible limiting behavior for the infinite sum. Notice that 


Af(a) _ flat+b)-f@ 


5 7 > f(a as b>0 
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and similarly 


ah (a) A? L @ 


> f'@), > f'"@), 


and so on. We write (1) as 


_ 2 
f(a+h) = flay + nO 5 ue my fe + 


and observe that the nth term 
h(h — b)(h — 2b)---(h-(n— 1)b) A” ANL@ _, h" 
n! pr 
Assuming that the limit of the infinite sum is the sum of these limits, we 
then get Taylor’s series (2) as the limit of (1) as b > 0. 


ait @ as b> 0. 


An Interpolation on Interpolation 


The importance of interpolation in the development of calculus seems to 
have been greatly underestimated. The topic rarely appears in calculus 
books today, and then only as a numerical method. Yet three of the most 
important founders of calculus, Newton, Gregory, and Leibniz, began their 
work with interpolation, and we have seen how this led to two of their 
most important results, the binomial theorem and Taylor’s theorem. (For 
Leibniz’s work, see Hofmann (1974).) When interpolation is relegated to 
numerical methods, this connection is lost. Of course, interpolation is a 
numerical method in practice, when one uses only a few terms of the 
Gregory—Newton series, but the full series is exact and hence much more 
interesting. It was interest in infinite expansions per se that distinguished 
Newton, Gregory, and Leibniz (as well as Wallis) from their predecessors 
in interpolation. 

Interpolation goes back to ancient times as a method for estimating the 
values of functions between known values. But perhaps the first to glimpse 
the possibility of exact interpolation were Thomas Harriot (1560-1621) 
and Henry Briggs (1556-1630). A formula has been found in Harriot’s 
papers that is equivalent to the first terms of the Gregory—Newton series; 
see Lohne (1965). Lohne dates this work of Harriot at 1611. Briggs may 
have learned something about interpolation from Harriot when the two 
were at Oxford around 1620. Briggs’s Arithmetica logarithmica (1624), 
which is concerned with the calculation of logarithms, uses series for 
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interpolation, and in the process gives the first instance of the binomial 
theorem for a fractional exponent 
1 1-1 1-1-3 1-1-3-5 
1 1/2 =] Biwacl 52. we 3 
—_ er wae ee a 
Gregory knew of Briggs’s work, and Newton certainly could have known 
of it, though no strong evidence that he did has yet been found. For more 
information on the history of interpolation, see Whiteside (1961) and Gold- 
stine (1977). 


++. 


EXERCISES 


Here is how to derive the general binomial series from the Gregory-Newton 
interpolation formula. 


9.3.1 Show that 


n 


A" f(@) = Yevr())Fe + ib), 


i=0 
where (") is the ordinary binomial coefficient. 
9.3.2 Ifa =0,b = 1, and f(x) = (1 + 4)’, show that A” f(0) = k” using the finite 
binomial series 
n _ n i 
(1+h)" = > (7h 
i=0 
9.3.3, Deduce the general binomial series 


a iis eee Ss 
me + 31 


(+b'=14xk+2 Prise 


using the Gregory—Newton interpolation formula. 


9.4 Fractional Power Series 


Power series helped to make mathematicians aware of the function con- 
cept by revealing the generality of the expression ay + ax + ajx7 + °°. 
However, not every function f(x) is expressible as ag + ayx + Gxx" +++, 
This is obvious for functions that tend to infinity as x — 0, since the power 
series has value aj when x = 0. For other functions, such as f(x) = x!/?, 
the behavior at 0 disallows a power-series expansion for a more subtle rea- 
son. These functions have branching behavior at 0; they are many-valued, 


and hence they are not functions in the strict sense. The function x!/?, for 
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example, is two-valued because each number has two square roots, one the 
negative of the other. 

Such behavior does not occur for a power series ay + a,x + Ayx2 +++, 
which has only one value for each value of x. All fractional powers of 
x are many-valued—x!/ 3 is three-valued, x!/* is four-valued, and so on. 
Many-valued behavior is typical of algebraic functions, where y is said 
to be an algebraic function of x if x and y satisfy a polynomial equation 
P(x, y) = 0. Since most polynomial equations are not solvable by radicals 
(Section 5.7), most algebraic functions are not given by finite expressions 
built from +,—, x, +, and fractional powers. 

However, it was the remarkable discovery of Newton (1671) that any 
algebraic function y can be expressed as a fractional power series in x: 


Y = ag + A,X" + anx? +agx 2 +---, 


where r|,/2,73,..., are rational numbers. Furthermore, the series can be 
rewritten in the form 


ag-t byx Cig + Ce Cor He2*) 


+ box? (C19 + CX + Cie +--+) 


+ by x(Cng 4G x S Cx +o) 


that is, as a finite sum of ordinary power series with fractional powers of x 
as multipliers. Near x = 0, y behaves like a finite sum of fractional powers. 
For example, if y?(1 + x)* = x, we have 


xl/2 


eae ae 


a2" (l= etx? =a 44), 


and near the origin, y has behavior similar to x!/*; in particular there 


are two values of y for each x. Newton’s contribution was an ingenious 
algorithm for obtaining the successive powers of x. The fractional pow- 
ers themselves were not properly understood until the variables x and y 
were taken to be complex. This was done in the 19th century, and on this 
basis a more rigorous derivation of Newton’s series was given by Puiseux 
(1850). For this reason, the fractional power-series expansions of algebraic 
functions are now called Puiseux expansions. 
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EXERCISE 


2 


The impossibility of an ordinary power series for x!/* can be shown as fol- 


lows. 


1/2 


9.4.1 Any ordinary power-series expansion of x’/~ would have to be of the form 


1/2 


x = ayxt axe +axr +: 


because x!/? = 0 when x = 0. Now square both sides and deduce a contra- 
diction. 


9.5 Summation of Series 


The results on infinite series seen so far are mostly decompositions or 
expansions rather than summations. That is, a known quantity or function 
is decomposed into an infinite series. Solutions of the converse problem, 
finding the sum of a given series, were comparatively rare. Archimedes’ 
summation of 1+ 1/4+1/47+--- was one. Perhaps the next were of series 
suchas 1/1-2+1/2-3+---+1/n(n+1)+---, given by Mengoli (1650). 
The series >) 1/n(n + 1) is easily summed because of the happy accident 
that 


1 _ 1 1 
nn+1l) on nl’ 
whence 
ratrattmey (ial +(5-3)+ Fares? 
1-2 +3 n(n + 1) 2 2 3 n n+l 
= j< | 
n+1 


By letting n — oo we then obtain the sum 1 for the infinite series. 

The first really tough summation problem was | + 1/2? + 1/37 +++. 
Mengoli tackled this without success, as did the brothers Jakob and Johann 
Bernoulli in a series of papers (1704). The Bernoulli brothers were able to 
sum similar series, rediscovering Mengoli’s }) 1/n(n+1) and also summing 
¥ 1/(n?-1), but for ¥: 1/n? itself they could obtain only trivial results such 
as 


4? 4 22 3 
The solution was finally obtained by Euler (1734), long after the death 
of Jakob Bernoulli, and Johann Bernoulli exclaimed, “In this way my 


1 1 1 Al 1 1 
ee ee ee ee ee eee ee oe 
22 6? 


152 9 Infinite Series 


brother’s most ardent wish is satisfied ...if only my brother were still 
alive!” (Johann Bernoulli, Opera, Vol. 4, p. 22). In fact, after hearing 
that the sum is 22/6, Johann Bernoulli himself discovered a proof, which 
turned out to be the same as Euler’s. 

Euler (1707-1783) was the greatest virtuoso of series manipulation, 
and his first summation of 1 + 1/2? + 1/32 +--- was one of his most 
audacious. (Later he gave more rigorous proofs.) Consider the equation 

sin VF_ 7 _ *%,% _F Lg (1) 
Vx 3! 5S! 7! 
easily obtained from the sine series of Section 8.5. This equation has roots 
X) = 7, X = (2m), x3 = 2)", ..., but not 0, because sin Vx/ Vx > 1 as 
x — 0. Now if a polynomial equation 


1 tae fax +: 2G,0" =0 


has roots x = x1, X2,..., Xn, Descartes’s factor theorem (Section 5.7) gives 
x x x 
Least tage" =(1- =)(1-4)...(1- 4), (2) 
x] x2 Xn 
Also 
1 1 : 
—+—+---+— =-coefficient of x = —ay, 
XxX, X2 Xe 


since each x term in the expansion of the right-hand side of (2) comes 
from a term —x/x; in one factor multiplied by 1’s in all the other factors. 
Assuming that this is also true of the “infinite polynomial” equation (1), 
we get 


1 1 1 1 
—+—+—+4+---=-coefficient of x = -|-—], 
XxX] X2 X3 2! 
that is, 
1 rn 1 n 1 eae 1 
m (Qn)? (x)? 6 
Hence 5 
1 1 T 
| ae ea hee SS .E.D.! 
a a 6 (Q ) 


The extraordinary and beautiful world of formulas revealed by Euler 
is today somewhat neglected in mathematics instruction. For a history of 
mathematics with an emphasis on infinite formulas, see Roy (2011). 
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EXERCISES 


Euler’s reasoning also leads to a correct infinite product formula for sin x, 
which in turn gives the Wallis product for 2/4 (Section 8.4). 


9.5.1 Deduce an infinite product for ae from Euler’s reasoning, and hence 


show that 
. Pes x Pas 
sin x = x(1 - =)(1 - sa\(I - a) 


9.5.2 By substituting x = 7/2 in the infinite product for sin x, show that 
2  1+3 385. 3°7 
nm 2:2 4:4 6-6 ” 


and hence obtain Wallis’s product for 7/4. 


9.6 The Zeta Function 


The sum 1 + oa + = +--+. first drew Euler’s attention to the function now 
known as the zeta function: 
1 1 1 
=l+—+-—+—+H+::- 
6(s) Qs 3s 4s 
This function is well-defined for real values of s > 1, and Euler’s initial 
discovery was that (2) = 2*/6. Later, he also found the values of Z(s) 


for s = 4,6,8,.... But his most spectacular discovery was the product 
formula of Euler (1748a), p. 288, showing that ¢(s) encodes the sequence 
2,3,5,7,11,..., of prime numbers. Euler’s formula is 
1 1 1 1 1 
(i — 1/25) 0 — 1/35) d — 1/55)  — 1/75) d - 1/115) 
1 1 1 
=1l+—+—+— 
2s 38 4s 


The factors on the left-hand side are (1 — 1/p%)~', where p, is the nth 
prime. To see why these factors give the terms on the right-hand side we 
expand each of them as a geometric series 


154 9 Infinite Series 


Multiplying all these series together, we get the reciprocal of each possible 
product of primes, to the sth power, exactly once. That is, the left-hand side 
is the sum 


1 1 
in which each product p/'p;”--- p;’ of primes occurs exactly once. But 
each natural number > 2 is expressible in just one way as a product of 
primes (Section 3.3), hence the latter sum equals the right-hand side of 


Euler’s formula 
Ge ee ae een, 
25 35 4s 

Initially the exponent s > 1 was there only to ensure convergence. 
We saw in Section 9.1 that ¢(s) diverges when s = 1; it converges when 
s > 1. Riemann (1859) discovered that ¢(s) becomes much more powerful 
when s is taken to be a complex variable. In recognition of this, ¢(s) is 
often called the Riemann zeta function. As mentioned above, the result 
of Section 9.5 shows ¢(2) = 2/6. The values of ¢(4), €(6), (8), ..., 
also found by Euler, turn out to be rational multiples of mn’, 7°, m1, ..., 
respectively. The values of £(3), (5), ... have no known relationship to 
or other standard constants, though Apéry (1981) showed that ¢(3) is irra- 
tional. The most famous conjecture about ¢(s), and one of the most sought- 
after results in mathematics today, is known as the Riemann hypothesis: 
&(s) = 0 only when s has real part 5 (excluding the trivial zeros described 


below). 
EXERCISES 


Although ¢(s) is not defined for s = 1 (because this gives the divergent series 
1+ 5 + ; + i +--+), this situation can be exploited to give a new proof that there 
are infinitely many primes. (Thus the Euler product formula encapsulates two 
apparently unrelated results—unique prime factorization, and the infinite number 
of primes.) 


9.6.1 (Euler) Show that if there are only finitely many primes pj,..., Py, then 


1 1 1 Ape aes 
1=1/p, 1=1/p i=ljp, 2 3° 4 


Deduce that there are infinitely many primes. 


The statement of the Riemann hypothesis needs some qualification, because 
¢(s) can be defined for certain values of s for which the series 1 + a + + + + +0: 
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is not meaningful. This follows from the formula 
= ST 
f(1 - s) = 2(22) * cos 3 FOE) 


discovered by Riemann and called the functional equation for the zeta function. 
The functional equation enables us to define €(1 — s) when Z(s) is known, and it 
also shows that there are certain trivial zeros of (1 — s), namely, where s satisfies 
cos + = 0. 


9.6.2 Which s give a trivial zero of Z(1 — s)? 


The function T in the functional equation is the gamma function, introduced 
by Euler to extend the factorial function, [(m) = (m — 1)!, to non-integer values 
of n. An amusing consequence of the functional equation is that we can assign 
values to certain divergent series, such as 1+2+3+4+---, by interpreting them 
as €(1 — s), then reinterpreting ¢(1 — s) by the functional equation. 


9.6.3 By suitable reinterpretation, show that 


14+24+3+4+4---=-1/12. 


Euler (1770a), p. 157, found another trick for the zeta function: giving a natural 
formula for the seemingly unnatural Euler constant y. Recall from Exercise 9.1.5 
that y is defined to be the limit of 1 + 5 + ; feet i — log(n+ 1) asn > ov, 


9.6.4 Using the Mercator series for log(1 + i). show that 


1 1 


1 1 
— — log(k + 1) + log(k) = — -—, + — - 
Ee DS ee) = sa ae” aa 


9.6.5 By adding the instances of the formula in Exercise 9.6.4 from k = 1 to 
k =n, show that 


1 1 1 
(145442) tous p= 
jl iy 2 tft iy ,+\,1/1,1, i 
2\12 2 rn} 3\13 23 mM} 4\14 24 nt , 


9.6.6 Deduce from Exercise 9.6.5 that y = £0) = £0) + © - sed tere, 


® 


Check for 
updates 
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Elliptic Curves and Functions 


PREVIEW 


Number theory revived in Europe with the rediscovery of Diophantus by 
Bombelli, and the publication of a new edition by Bachet de Méziriac 
(1621). It was this book that inspired Fermat and launched number theory 
as a modern mathematical discipline—one that draws on resources from 
all parts of mathematics. 

Fermat mastered and extended the techniques of Diophantus, such as 
the chord and tangent method for finding rational points on cubic curves. 
This was the beginning of the modern theory of elliptic curves, which take 
their name, in a roundabout way, from what are called elliptic functions. 

Elliptic functions, like many innovations in mathematics, arose as a 
way around an impasse: that no “known” function f(x) has derivative 
1/ V1 — x*. Eventually, mathematicians accepted the fact that f TS is 
a new function. It is one of a family called the elliptic integrals, because 
one of them is the integral that defines the arc length of the ellipse. 

Around 1800 Gauss realized that, rather than studying u = i, rae 
one should study its inverse function x as a function of u (just as one 
should study the sine function rather than the arcsine integral i ae , 
Gauss wrote x = sl(u) and found that the function s/, like the sine, is 
periodic; that is, situ + 2@) = sl(u), where @ is a certain real number. 


More surprisingly, s/ has second period 2ia, so s/ is better viewed as 
a function of complex numbers. These results first became widely known 
when they were rediscovered, published, and extended by Abel and Jacobi 
in the 1820s. 
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10.1 Fermat’s Last Theorem 


On the other hand, it is impossible for a cube to be written as 
a sum of two cubes or a fourth power to be written as a sum 
of two fourth powers or, in general, for any number which 
is a power higher than second to be written as a sum of two 
like powers. I have a truly marvellous demonstration of this 
proposition which this margin is too small to contain. 


Fermat (1670), p. 241 


This remark, written in the margin of his copy of Bachet’s Diophantus 
when he was studying that work in the late 1630s, is the second item in Fer- 
mat’s Observations on Diophantus, published posthumously in 1670. Fer- 
mat was responding to Diophantus’s treatment of the problem of express- 
ing a square as a sum of two squares. As we saw in Chapter 1, this is the 
problem of finding Pythagorean triples (a, b, c) or, equivalently, of finding 
the rational points (a/c, b/c) on the circle x* + y? = 1. 

Fermat’s last theorem, the claim that there are no triples (a, b,c) of 
positive integers such that 


a" +b" =c", wheren > 2 is an integer, 


became the most famous problem in mathematics. It was finally proved by 
Wiles (1995) and then only with a deep and unexpected intervention by the 
theory of elliptic curves, which we introduce below in Section 10.5. As far 
as we know, Fermat himself proved it only for n = 4. However, his proof 
was interesting and fruitful enough to be worth describing here—not least 
because it too touches on elliptic curves. It began with a problem about 
right-angled triangles. 


Rational Right-Angled Triangles 


The area of a right-angled triangle the sides of which are ratio- 
nal numbers cannot be a square number. This proposition, 
which is my own discovery, I have at length succeeded in 
proving, though not without much labour and hard thinking. I 
give the proof here, as this method will enable extraordinary 
developments to be made in the theory of numbers. 


Fermat (1670), p.271 
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This is number 45 of Fermat’s Observations on Diophantus, responding 
to a problem posed by Bachet: to find a right-angled triangle whose area 
equals a given number. The observation is important not only for the the- 
orem and the method announced, but also because it is followed by the 
only reasonably complete proof left by Fermat in number theory. As a 
bonus, the proof implicitly settles Fermat’s last theorem for n = 4 (see 
exercises) and is an excellent illustration of his method of infinite descent, 
which did indeed lead to extraordinary developments in the theory of num- 
bers. In what follows, the statements that make up Fermat’s proof, appear- 
ing indented like the quote above, are expanded and expressed in modern 
notation following the reconstruction of Zeuthen (1903), p. 163. We use 
the translation of Fermat given by Heath (1910), p. 293, in his version of 
the reconstruction. 


If the area of a right-angled triangle were a square, there would 

exist two biquadrates the difference of which would be a square 
number. Consequently there would exist two square numbers 

the sum and difference of which would be squares. 


By choosing a suitable unit of length, we can express the sides of a rational 
right triangle as a Pythagorean triple of relatively prime integers p* — q’, 
2pq, p’ + g°, as noted in Section 1.2. Since their gcd is 1, ged(p,q) = 1 
also. Therefore, since 2pq is even, p* — q and its factors p +g, p — q must 
be odd. Also, no two of p, g, p +4, p — q have a common prime divisor, 
otherwise p, g would. Then if the area pq(p + qg)(p — q) is a square, its 
factors must all be squares: 


p=? G28, pigsrts =r. pogerersr, (1) 


Thus the sum and difference of the squares r7, s* are also squares, so 


Po-sa=P4%P-%) =P =v’. 


Therefore we should have a square number which would be 
equal to the sum of a square and the double of another square, 
while the squares of which this sum is made up would them- 
selves have a square number for their sum. 


From (1) we have 


P—w?=2s, thatis, 1 =u? +25”. (2) 
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And also from (1), 


But if a square is made up of a square and the double of 
another square, its side, as I can very easily prove, is also 
made up of a square and the double of another square. 


Since (t + u)(t — u) = ft? — u* = 2s* from (2), (t + u)(t — uv) is even. Then 
one of t + u, t — wis even, and consequently so is the other. Put 


t+tu=2w, t-—u=2x. (3) 


Then 
s* =(t+u)(t—u)/2 = 2wx. 


Tracing back through (3), (2), (1) we see that any common divisor of w, x 
would also be common to f, u, to #”, u”, to r*, s”, and hence to p,q. Thus 
w, x are relatively prime and therefore, since wx is twice a square, we have 


either 


w= y’, x = 222 or w = 22, x=y’. 


In either case, 
t=wt+x=y' +22’. (4) 


From this we conclude that the said side is the sum of the 
sides about the right angle in a right-angled triangle, and that 
the simple square contained in the sum is the base, and the 
double of the other square the perpendicular. 


If we let y*, 2z* be the sides of a right triangle, then the hypotenuse h 
satisfies 


WP =P? + 02) = 5 (+22) + - 227) 


= xe +u’) by (3) and (4) 
=/’. by (1) 


Hence h/ = r and the triangle is rational. 


This right-angled triangle will thus be formed from two squares, 
the sum and difference of which will be squares. But both 
these squares can be shown to be smaller than the squares 
originally assumed to be such that both their sum and their 
difference are squares. 
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The original squares with sum and difference equal to squares were 
p = 1°, q = s’, coming from the perpendicular sides p* — q? and 2pq of 
the rational right triangle whose area was assumed to be a square. We now 
have a rational (indeed integral) right triangle with perpendicular sides 
y’, 2z* whose area yz” is also a square. This triangle is smaller, since 
its hypotenuse r is less than side 2pq of the original triangle, so it gives 
a smaller pair of (integer) squares p’, g’, whose sum and difference are 
squares. 


Thus, if there exist two squares such that the sum and differ- 
ence are both squares, there will also exist two other integer 
squares which have the same property but a smaller sum. By 
the same reasoning we find a sum still smaller than the last 
found, and we can go on ad infinitum finding integer square 
numbers smaller and smaller with the same property. This is, 
however, impossible because there cannot be an infinite series 
of numbers smaller than any given integer we please. 


This contradiction means that the initial assumption of a rational right tri- 
angle with square area is false. The versions of Zeuthen and Heath proceed 
more directly to a contradiction than Fermat by observing that the descent 
from the hypothetical initial triangle to the one with area yz” can be iter- 
ated to give an infinite descending sequence of integer areas. Weil (1984), 
p. 77, shortens the proof even further. 


EXERCISES 


Two of the propositions that arise in the descent from the hypothetical ratio- 
nal right triangle with square area are of independent interest and are also false 
because they imply the existence of such a triangle. 


10.1.1 Show that the existence of squares r? and s* for which r? + s* and r?—s? are 


both squares implies the existence of a rational right triangle with square 
area. 


10.1.2 Show that a nonzero integer solution of r+ — s* = v* implies the existence 


of a rational right triangle with square area. (Hint: It is the same triangle as 
in Exercise 10.1.1.) 
10.1.3 From Exercise 10.1.2, deduce Fermat’s last theorem for n = 4. 
The impossibility of a nonzero integer solution r4— s+ = v* can also be shown 
by a more direct descent that avoids some of the steps used by Fermat. The main 
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steps are as follows, assuming r, s, and hence v have no common prime divisor. 


fos=avr 3 r=at+b’, s = 2ab, v=aa-b? 
for some nonzero integers a, b 
> a=C-d, b=2cd 
for some nonzero integers c,d 
=> c=e,d= f’and cd’ are squares 
because s? = 4cd(c? — d’) 
and c,d, c* — d’ have no common prime divisor 
=a dpa 
for an integer pair (e, f) smaller than (7, s). 


10.1.4 Justify the steps in this argument. 


10.2 Rational Points on Cubics of Genus 0 


It is doubtful that Fermat had a proof of Fermat’s last theorem because most 
of his work deals with curves of low degree (< 4), and it is highly unlikely 
that he could have foreseen what actually happened in the 1980s: a reduc- 
tion of the nth-degree Fermat problem to a question about cubic curves. 
Fermat did not even talk about rational points on curves. Nevertheless, this 
is the most natural way to interpret his solutions of Diophantine equations 
and to link them with earlier and later results in the same vein by Diophan- 
tus and Euler, respectively. We have already described methods for finding 
rational points on curves of degree 2 (in Section 1.3) and 3 (in Section 3.5). 
Now we reexamine them from the point of view of genus, which becomes 
increasingly important as curves of higher degree are considered. 

We cannot define genus yet (for that, see Chapters 11 and 15) but 
it measures the algebraic complexity of a curve. In particular, curves of 
genus 0 are those that can be parameterized by rational functions. 

One property of a curve C of degree 2 observed in Section 1.3 is that 
a rational line L through a rational point P on C meets C in a second 
rational point, provided the equation of C has rational coefficients. Also, 
one obtains all rational points Q on C in this way by rotating L about C. 
This construction has another important consequence, not depending on 
the coefficients of C or L: expressing the x and y coordinates of Q in terms 
of the slope t of L gives a parameterization of C by rational functions of t 
(bear in mind that a rational function need not have rational coefficients). 
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>= 


QQ y=t(x+1) 


Figure 10.1: Parameterizing the circle 


For example, this construction on the circle x? + y* = 1 in Section 1.3 
gave the parameterization 


_1-? _ 2t 
Oe Hoag 
(Figure 10.1), which we used in Section 9.2 to find a formula for z. Genus 
0 curves can be defined as those that admit parameterization by rational 
functions. I will now show that genus 0 includes some cubic curves by 
applying a similar construction to the folium of Descartes. 
The folium was defined in Section 6.3 as the curve with equation 


xty? = 3axy. (1) 


The origin O is an obvious rational point on the folium; moreover, O is 
a double point of the curve, as Figure 10.2 makes clear. The line y = tx 
through O therefore meets the folium at one other point P, and varying tf 
gives all other points P on the curve. By finding the coordinates of P as 
functions of t, we therefore obtain a parameterization. 

To find P we substitute y = tx in (1), obtaining 


x +P = 3axtx, 
hence 


3at 
x= ; 
148 


(2) 
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Figure 10.2: Parameterizing the folium 


and therefore ‘ 
3at 
= : 2 
I~ 148 @) 
(This derivation was implicit in Exercise 6.3.1.) A similar construction 
applies to any cubic with a double point, or more generally to any curve of 


degree n + 1 with an n-tuple point; hence all such curves are of genus 0. 


EXERCISES 


It should be noted that a double point on a curve p(x, y) = 0 yields a double 
root of the equation p(x,mx + c) = 0 for the intersections of a line y = mx +c 
through the double point. 


10.2.1 Observe the double root of the equation obtained by substituting y = tx in 
equation (1) above. 


10.2.2 Explain, using the general double root property, why a line of rational slope 
through a rational double point on a cubic curve with rational coefficients 
necessarily meets the curve at another rational point. 


We note also that, as in the construction for quadratic curves, all rational 
points on the folium are obtained by this method. 


10.2.3. Show that if x and y are rational, then so is f in (2) and (3). 


10.2.4 Deduce from Exercise 10.2.3 that the rational points on the folium are pre- 
cisely those with rational t-values. 
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10.3. Rational Points on Cubics of Genus 1 


We cannot yet give a precise definition of genus 1, but it happens to be the 
genus of all cubic curves that are not of genus 0. We know from Section 
10.2 that cubics of genus 1 cannot have double points, and in fact they also 
cannot have cusps because both these cases lead to rational parameteriza- 
tions. (For one case of a cusp, see Exercise 6.4.1.) What we have yet to find 
are functions that do parameterize cubics of genus 1. Such functions, the 
elliptic functions, were not defined until the 19th century, and they were 
first used by Clebsch (1864) to parameterize cubics. 

Many clues to the existence of elliptic functions were known before 
this, but at first they seemed to point in other directions. Initially, the mys- 
tery was how Diophantus and Fermat generated solutions of Diophantine 
equations. Newton’s (1670s) interpretation of their results by the chord— 
tangent construction (Section 3.5) cleared up this first mystery—or would 
have if anyone had noticed it at the time. But before mathematicians really 
became conscious of the chord—tangent construction, they had to explain 
some puzzling relations between integrals of functions such as 

ax? + bx? + cx + d, found by Fagnano (1718) and Euler (1768). Even- 
tually Jacobi (1834) noticed that the chord—tangent construction explained 
this mystery too. Jacobi’s explanation was cryptic, and, even though ellip- 
tic functions were then known in connection with integrals, they were 
not fully absorbed into number theory and the theory of curves until the 
appearance of Poincaré (1901). 

The analytic origins of elliptic functions will be explained in the next 
sections. In this section we prepare to link up with this theory by deriving 
the algebraic relation between collinear points on a cubic curve. A much 
deeper treatment of the whole story appears in Weil (1984). 

We start with the cubic curve equation in Newton’s form (Section 6.4): 


y? =ax'+bx° +cx+d. (1) 
Figure 10.3 shows this curve when y = 0 for three distinct real values of x. 
In Section 3.5 we found that if a,b, c,d are rational, and if P,, P> are 


rational points on the curve, then the straight line through P;, Pz meets the 
curve at a third rational point P3. If the equation of this straight line is 


y=tx+k, (2) 
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P3 
Pi 22 


Figure 10.3: Collinear points on a cubic curve 


then the result of substituting (2) in (1) is an equation 
ax? +bx? +cx+d—(tx+k? =0 (3) 


for the x coordinates x,, x2, x3 of the three points P;, P2, P3. But if the 
roots of (3) are x), x2, x3 its left-hand side must have the form 


A(X — x1 )(x — X2)(x — x3). 
In particular, the coefficient of x? must be 
—a(x, +X. + x3). 
Comparing this with the actual coefficient of x? in (3), we find 
b-?r= —a(x, + Xo + x3)3 


hence 
— (2 


(4) 


If Py = (1,41), Pa = (%2, y2), then the slope t = (yz — yi)/(x2 — x1), and 
substituting this in (4) we finally obtain 
b= [yr — yi)/@2 — xP 


X3 = —(X1 + 42) = (5) 


= SO + a) = 


giving x3 as an explicit rational combination of the coordinates of P;, P2. 
If P;, Pz are rational points, then (5) shows that x3 (and hence y3 = tx3+k) 
is also rational, as we already knew. 
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What is unexpected is that (5) is also an addition theorem for elliptic 
functions. This has the consequence that the curve can be parameterized by 
elliptic functions x = f(u), y = g(u) such that (5) is precisely the equation 
expressing x3 = f(u, + uz) in terms of f(u,) = x1, f(u2) = x2, gui) = 1, 
and g(u2) = y2. Thus the straight-line construction of x3 from x, and x2 
can also be interpreted as addition of parameter values, uy and uz of x; and 
X. The first addition theorems were found by Fagnano (1718) and Euler 
(1768) by means of transformation of integrals. Euler realized that there 
was a connection between such transformations and number theory, but he 
could never quite put his finger on it. Even earlier, Leibniz had suspected 
such a connection when he wrote: 


I... remember having suggested (what could seem strange to 
some) that the progress in our integral calculus depended in 
good part upon the development of that type of arithmetic 
which, so far as we know, Diophantus has been the first to 
treat systematically. 


Leibniz (1702), as translated by Weil (1984) 


Jacobi (1834) apparently saw the connection for the first time after 
receiving a volume of Euler’s works on the transformation of integrals, but 
considerable clarification of elliptic functions was needed before Jacobi’s 
insight became generally available. We describe some of the main steps in 
this process of clarification below and in Chapter 12. 


EXERCISES 


A proof that specific curves cannot be parameterized by rational functions 
can be modeled on Fermat’s proof that r+ — s+ = v” is impossible in positive 
integers. (This is why we said in Section 10.1 that Fermat’s theorem touches on 
elliptic curves.) The reason is that the behavior of rational functions is surprisingly 
similar to that of rational numbers, with polynomials playing the role of integers, 
and degree being the measure of size. The most convenient curve to illustrate the 


idea is y = 1 — x*, which happens to be of genus 1, hence an elliptic curve. 


10.3.1 Show that a parameterization of y? = 1 — x* by rational functions of u 
implies that there are polynomials r(u), s(u), and v(u) with 


r(u)* - s(u)* = v(u)*. 


Now to imitate the rest of Fermat’s proof (or the simplified version in Exercise 
10.1.4) one needs a theory of divisibility for polynomials. Like the theory for 
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natural numbers, this can be based on the Euclidean algorithm. It follows the 
same basic lines as in Section 3.3, so we omit it here, but see Section 16.5 for 
more details. 

One also needs the formula for “Pythagorean triples” of rational functions. 
This can be found by the geometric method of Section 1.3, carried out in the 
“rational function plane” where each “point” is an ordered pair (x(u), y(u)) of 
rational functions. 


10.3.2 Convince yourself that “lines” and “slope” make sense in the rational func- 
tion plane, and hence show that each point # (0, —1) on the “unit circle” 
x(u)? + y(u)” = 1 


is of the form 1=1? 2t(u) 


for some rational function t(u). 


10.3.3, Deduce from Exercise 10.3.2 a formula for “Pythagorean triples” of poly- 
nomials, like Euclid’s formula for ordinary Pythagorean triples. 

It is now possible to imitate Fermat’s proof, showing that r(u)* — s(u)* = v(uy* 
is impossible for polynomials, and hence that y* = 1 — x* has no parameterization 
by rational functions. It follows that the same is true of certain cubic curves. 

10.3.4 Substitute x = (X + 1)/X and y = Y/X? in y? = 1 — x*, and hence show 
Y? = cubic polynomial in X. 


Deduce that if this cubic curve in X, Y has a rational parameterization, then 
so has y? = 1 — x*. 


10.4 Elliptic and Circular Functions 


The story of elliptic functions is one of the most curious in the history of 
mathematics, beginning with a complicated analytic idea—integrals of the 
form f R(t, Vp) dt, where R is a rational function and p is a polynomial 
of degree 3 or 4—and reaching a climax with a simple geometric idea— 
the torus surface. Perhaps the best way to understand it is to compare it 
with a fictitious history of circular functions that begins with the integral 
f dt/ V1 —t? and ends with the discovery of the circle. Unlikely as this 
fiction is, it was paralleled by the actual development of elliptic functions 
between the 1650s and the 1850s. 

The late recognition of the geometric nature of elliptic functions was 
due to late recognition of the existence and geometric nature of complex 
numbers. In fact, the later history of elliptic functions unfolds alongside 
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the development of complex numbers, which is the subject of Chapter 12. 
In the present chapter we are concerned mainly with the history up to 1800, 
before complex numbers entered in a really essential way. However, there 
are some subplots of the main story that do not require complex num- 
bers for their understanding and nicely show the parallel with the fictitious 
history of circular functions. It is convenient to relate one of these now, 
because it illustrates the parallel in a simplified way and also ties up a 
loose end from Section 10.3—the parameterization of cubic curves. 


Parameterization of Cubic Curves 


To see how to construct parameterizing functions for a cubic curve, we 
first reconstruct the parameterizing functions 


x = sinu, 


y = cosu 


for the circle x? + y* = 1, pretending that we do not know this curve 
geometrically but only as an algebraic relation between x and y. 

The sine function can be defined as the inverse f of f~!(x) = sin”! x, 
which in turn is definable as the integral 


= “dt 
pay= [ e. 
Finally, the integral can be related to the equation y” = 1 — x”, because the 
integrand 1/ V1 — x? is simply 1/y. Why do we use this integrand rather 
than any other to define u = f~!(x) and hence obtain x as a function f(u)? 
The answer is that we then obtain y as f’(u); hence x, y are both functions 
of the parameter u. This is confirmed by the calculations: 


eS ae du 
ro- ue 


ud dt LL 
dx dxJo VI-P VI-#® 


so y = f’(u) (which of course is cos x). 
Exactly the same construction can be used to parameterize any relation 
of the form y? = p(x). We put 


and 


X d t 


0 yp) 


n= GG) = 
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to get x = g(u), and then find that y = g’(u) by differentiation of u. Thus 
in a sense it is trivial to parameterize curves of the form y~ = p(x) (which 
we know from Section 7.4 to include all cubic curves, up to a projective 
transformation of x and y). As we will see in the next section, the inte- 
grals f dt/ Vp) had been studied since the 1600s for p a polynomial of 
degree 3 or 4; however, no one thought to invert them until about 1800. 
Jacobi had a deep knowledge of both the integrals and inversion when he 
wrote his cryptic paper, Jacobi (1834), pointing out the relation between 
integrals and rational points on curves (Sections 10.3 and 10.7). Thus it 
seems likely he understood the preceding parameterization, though such a 
parameterization was first given explicitly by Clebsch (1864). 


EXERCISES 


It may happen that the integral i. dt/ p(t) does not converge because of 
the behavior of 1/+/p(t) at t = O. But in that case one can use the parameter 
u= f(x) = ft dt/ p(t) for some other value of a. 


10.4.1 Check that y = f’(u) remains true with this change of definition. 


When the cubic curve is y* = x°, which has a rational parameterization, the 
parameterizing functions constructed above indeed turn out to be rational. 


10.4.2 Given y = x*/?, find x = f(u) and y = f’(u), where u = f~!(x) = i oe 


10.5 Elliptic Integrals 


Integrals of the form f R(t, Vp) dt, where R is a rational function and p 
is a polynomial of degree 3 or 4 without multiple factors, are called elliptic 
integrals, because the first example occurs in the formula for the arc length 
of the ellipse. (The functions obtained by inverting elliptic integrals are 
called elliptic functions, and the curves that require elliptic functions for 
their parameterization are called elliptic curves. This drift in the meaning 
of “elliptic” is rather unfortunate because the ellipse, being parameteriz- 
able by rational functions, is not an elliptic curve!) 

Elliptic integrals arise in many important problems of geometry and 
mechanics, for example, finding arc lengths of the ellipse and hyperbola, 
period of the simple pendulum, and deflection of a thin elastic bar. See for 
example, Melzak (1976), pp. 253-269. When these problems first arose in 
the late 17th century they were the first obstacle to Leibniz’s program of 
integration in “closed form” or “by elementary functions.” As mentioned 
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in Section 8.6, Leibniz thought the proper solution of an integration prob- 
lem f f(x) dx was a known function g(x) with the property g’(x) = f(x). 
The functions then “known,” and now called “elementary,” were those 
composed from algebraic, circular, and exponential functions and their 
inverses. 

All efforts to express elliptic integrals in these terms failed, and as early 
as 1694 Jakob Bernoulli conjectured that the task was impossible. The 
conjecture was eventually confirmed by Liouville (1833), in the course of 
showing that a large class of integrals is nonelementary. In the meantime, 
mathematicians had discovered so many properties of elliptic integrals, 
and the elliptic functions obtained from them by inversion, that they could 
be considered known even if not elementary. 

The key that unlocked many of the secrets of elliptic integrals was the 
curve known as the lemniscate of Bernoulli (Figure 10.4). This curve was 
mentioned briefly in Section 2.5 as one of the spiric sections of Perseus. 


y 


A 


Figure 10.4: The lemniscate of Bernoulli 
It has cartesian equation 
(4 yyr=2-¥ 


and polar equation 


r° = cos 20. 


The first to consider it in its own right was J akob Bernoulli (1694). He 
showed that its arc length is the elliptic integral i dt/ V1 — t+, later known 
as the lemniscatic integral, thus giving this formal expression a concrete 
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geometric interpretation. Many later developments in the theory of elliptic 
integrals and functions grew from interplay between the lemniscate and 
the lemniscatic integral. As the simplest elliptic integral, or at any rate the 
most analogous to the arcsine integral i, dt/ V1 — t?, the lemniscatic inte- 
gral iN dt/ V1 —1* was the most amenable to manipulation. It was often 
possible, after some property had been proved for the lemniscatic integral, 
to extend the argument to more general elliptic integrals. 

The most notable example of this methodology was in the discovery 
of the addition theorems, which we discuss in the next section. 


EXERCISES 


The properties of the lemniscate mentioned above are easily proved by some 
standard analytic geometry and calculus. 


10.5.1 Deduce the cartesian equation of the lemniscate from its polar equation 


r? = cos 26. 


10.5.2 Use the polar equation of the lemniscate and the formula for the element 
of arc in polar coordinates, 


ds = V(rd6oy + dr? 


to deduce that arc length of the lemniscate is given by 


do 

s= ] —. 

r 

10.5.3 Conclude, by changing the variable of integration to r, that the total length 
of the lemniscate is 4 i dr/Vi-r*. 


Unlike the arcsine integrand 1/ V1 — #, which is rationalized by substituting 
2v/(1 + v*) for t, the lemniscatic integrand 1/ V1 — ¢* cannot be rationalized by 
replacing ¢ by any rational function. 


10.5.4 Explain how this follows from the exercises in Section 10.3. 


It was this connection between the lemniscatic integral and Fermat’s theorem 


4 — y in positive integers that led Jakob Bernoulli 


on the impossibility of r+ — s 
to suspect the impossibility of evaluating the lemniscatic integral by known func- 


tions. 
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10.6 Doubling the Arc of the Lemniscate 


An addition theorem is a formula expressing f(u; + v2) in terms of f(u) 
and f(u2), and perhaps also f’(u,) and f’(u2). For example, the addition 
theorem for the sine function is 


sin(u; + uz) = SiN Uy COS U2 + SiN U7 COS Uy. 


Since the derivative, cos u, of sinu equals V1 — sin? u, we can also write 
the addition theorem as 


sin(u; + u2) = sinu; V1 - sin’ un + siNuz V1 - sin? Uy, 


showing that sin(u; + v2) is an algebraic function of sin uw, and sin up. 
To simplify the comparison with elliptic functions we consider the fol- 
lowing special case of the sine addition theorem: 


sin 2u = 2sinu V1 — sin? u. (1) 


If we let 


w= sin'x= [oS 
0 VI-2’ 
then F 
- t 
2u =2 i —_.. 
0 Vl-? 
But from (1) we also have 
2u = sin '(2x V1 — x?), 
sO 
{ dt _ 2x V1—x dt (2) 
0 vVl-? 0 VI-P 
Bearing in mind that sin™' x = vi dt/ V1 — t? represents the angle u seen in 
Figure 10.5, equation (2) tells us that the angle (or arc length) u is doubled 
by going from x to 2x V1 — x?. The latter number, since it is obtained from 
x by rational operations and square roots, is constructible from x by ruler 


and compass (confirming the geometrically obvious fact that an angle can 
be duplicated by ruler and compass). 
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O 2x v1 — x2 1 


Figure 10.5: Doubling a circular arc 


All this has a remarkable parallel in the properties of the lemniscate 
and its arc-length integral i dt/ V1 —1t*. The discovery of a formula for 
doubling the arc of the lemniscate by Fagnano (1718) showed that geomet- 
ric information could be extracted from the previously intractable elliptic 
integrals, and we can also view it as the first step toward the theory of 
elliptic functions. In our notation, Fagnano’s formula is 


dt 2x V1—x4 /(1+x*) dt 
==] 3) 


TS ee 


Since 2x V1 — x4/(1 + x*) is obtained from x by rational operations and 
square roots, (3) shows, like (2), that the arc can be doubled by ruler and 
compass construction. 


EXERCISES 


Fagnano derived his formula by two substitutions that, as Siegel (1969), p. 3, 
points out, are analogous to a natural substitution for the arcsine integral. The 
following exercises compare the effect of the substitution tf = 2v/(1 + v*) in 
dt/ V1 — t? with analogous substitutions for #7 in dt/ V1 — #4. 


10.6.1 Show that substituting t = 2v/(1 + v) gives V1 — 2 = (1 —v*)/(1 +0”) and 
hence that dt/ V1 — 2 = 2dv/(1 + 0’). 
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10.6.2 Show that f? = 2v?/(1 + v*) gives V1 — 4 = (1 — v*)/(1 + 0+) and hence 
dt dv 


VI-4 Ler wer 


It follows that this change of variable corresponds to a certain relation between 
integrals, which turns out to be half way to the Fagnano formula. 


10.6.3 Deduce from Exercise 10.6.2 that 
_ a V1+x4 dt 
0 


xX dv 
val Vie! vI-f 


To complete the journey to the Fagnano formula we make a second, similar, 
substitution that recreates the lemniscatic integral. 


10.6.4 Similarly show that the substitution v? = 2w?/(1 — w*) gives 
dv 2 AW dw 
V1+04 V1l-ut 
10.6.5 Check that the result of the substitutions in Exercises 10.6.2 and 10.6.4 is 
a 2w V1 — wt 


1+wu* 


and that the corresponding relation between integrals is the Fagnano dupli- 
cation formula. 


10.7 General Addition Theorems 


The Fagnano duplication formula was a little-known curiosity until Euler 
received a copy of Fagnano’s works on December 23, 1751, a date later 
described by Jacobi as “the birth day of the theory of elliptic functions.” 
Euler was the first to see that Fagnano’s substitution trick was not just a 
curious fluke but a revelation of the behavior of elliptic integrals. With 
his superb manipulative skill Euler was quickly able to extend it to very 
general addition theorems; first to the addition theorem for the lemniscatic 
integral, 


if V { — { (rVi-yity Vint?) gy 
# 2 . 
pe ae ee Vion 


then to f dt/ p(t), where p(t) is an arbitrary polynomial of degree 4. An 
ingenious reconstruction of Euler’s train of thought, by analogy with the 
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arcsine addition theorem 


ft dt ft dt ft VE ay 
——- + —— = , 
0 Vl-2 Jo vV1l-# 0 vV1l-? 


has been given by Siegel (1969), pp. 1-10. Of course, Euler was dealing 
only with elliptic integrals, not with elliptic functions. But Jacobi could 
see his results as addition theorems for elliptic functions as easily as we 
can see that the arcsine addition theorem is really a theorem about sines! 


It should be mentioned that Euler’s addition theorems do not cover all 
kinds of elliptic integrals. The classical theory of elliptic integrals of the 
different kinds, with their various addition and transformation theorems, 
was systematized by Legendre (1825). Ironically, this was just before the 
appearance of elliptic functions, which made much of Legendre’s work 
obsolete. 


These early investigations exploited some of the formal similarities 
between f dt/ Vp, where p is a polynomial of degree 4, and f dt/ Va, 
where q is a quadratic. There is no real difference if p is of degree 3, as 
an easy transformation shows (Exercise 10.7.1). This is why f dt/ Vp) is 
also called an elliptic integral when p is of degree 3. In fact, it eventually 
turned out that the most convenient integral to use as a basis for the theory 


of elliptic functions is f dt/ V4t? — g2t — g3, whose inverse is known as 
the Weierstrass g-function. 


The addition theorem for this integral is 


ie dt 4 i‘ dt _ is dt 
0 VaP—gt-g Jo V4P—gt—-g, Jo V4a8-gt-g3 


where x3 is none other than the x-coordinate of the third point on 
2 3 
yo = 4x" — gox- 93 


of the straight line through (x;, y,) and (x2, yz) (see Section 10.3). Now that 
we know, from Section 10.4, that this curve is parameterized by x = g(u), 
y = 9’(u), defined by inverting the integral, some connection between the 
geometry of the curve and the addition theorem is understandable. But the 
stunning simplicity of the relationship seems to demand a deeper explana- 
tion. This lies in the realm of complex numbers, which we enter briefly in 
the next section and more thoroughly in Chapter 12. 
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EXERCISES 


10.7.1 Show that the substitution t = 1/u transforms 


Z into at 
V(t — a(t — b\(t — c) Vu — ua)(1 — ub) — uc) 


Conversely, we can transform quartic polynomials under the square root sign to 
cubics, even in cases where the quartic is not of the form obtained in Exercise 
10.5.1. 

10.7.2 Transform 


3 du 
into 


dt 
vil-7 /cubic polynomial in w 


by making a suitable substitution for f. 


10.8 Elliptic Functions 


The idea of inverting elliptic integrals to obtain elliptic functions is due 
to Gauss, Abel, and Jacobi. Gauss had the idea in the late 1790s but did 
not publish it; Abel had the idea in 1823 and published it in 1827, inde- 
pendently of Gauss. Jacobi seems to have been approaching the idea of 
inversion in 1827, but was stung into action only by the appearance of 
Abel’s paper. His ideas then developed at an explosive rate, and he pub- 
lished the first book on elliptic functions, the Fundamenta nova theoriae 
functionum ellipticarum, two years later (Jacobi (1829)). 

Gauss first considered inverting an elliptic integral in 1796, in the case 
of f dt/ ‘V1 —1t3. The next year he inverted the lemniscatic integral and 
made more progress. Defining the /emniscatic sine function x = sl(u) by 


i “dt 
u= i 
0 vl-f 
he found that this function is periodic, like the sine, with period 
1 
dt 
20 =4 { ; 

0 vl-f 

He also noticed that s/(u) invites complex arguments, since i? = —1 implies 


d(it) . dt 


v1 —-(it)4 Va 
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so sl(iu) = isl(u) and the lemniscatic sine has a second period 2i@. Thus 
Gauss discovered double periodicity, a key property of the elliptic func- 
tions, though at first he did not realize its significance. The scope and 
importance of elliptic functions hit him on May 30, 1799, when he found 
an extraordinary numerical coincidence. His diary entry of that day reads: 

We have established that the arithmetic-geometric mean between 

1 and V2 is x/a@ to 11 places; the demonstration of this fact 

will surely open up an entirely new field of analysis. 


Gauss had been fascinated by the arithmetic-geometric mean (agM) 
since 1791, when he was 14. The agM(a, b) of two positive numbers a and 
b is the common limit of the two sequences {a,} and {b,} defined by 


ag =a, bo = b, 
ants = athe Dna = Vandy. 
For more information on its theory and history, see Cox (1984). 

It is indeed true that agM(1, v2) = n/m, as Gauss soon proved, and 
the “entirely new field of analysis” he created from the stew of these ideas 
was extraordinarily rich. It encompassed elliptic functions in general, the 
theta functions later rediscovered by Jacobi, and the modular functions 
later rediscovered by Klein. The theory was not significantly improved 
until the 1850s, when Riemann showed that double periodicity becomes 
obvious when elliptic integrals are placed in a suitable geometric setting. 

Unfortunately, Gauss released virtually none of his results on elliptic 
functions. Apart from a formula for agM(a, b) as an elliptic integral (Gauss 
(1818)), he published nothing until Abel’s results appeared in 1827—then 
promptly claimed them as his own. He wrote to Bessel (Gauss (1828)): 

I shall most likely not soon prepare my investigations on the 
transcendental functions which I have had for many years— 
since 1798. ... Herr Abel has now, as I see, anticipated me 
and relieved me of the burden in regard to one third of these 
matters. 


It was disingenuous of Gauss to claim he had more results than Abel, 
because Abel also had results unknown to Gauss. True, Gauss had prior- 
ity on the key ideas of inversion and double periodicity, but priority isn’t 
everything, as Gauss himself perhaps knew. His own cherished discovery 
of the relation between agM and elliptic integrals had not only been found 
earlier, but even published by Lagrange (1785). 
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A Postscript on the Lemniscate 


The duplication of the arc of the lemniscate had some interesting conse- 
quences for the lemniscate itself. Fagnano showed, by similar arguments, 
that a quadrant of the lemniscate can be divided into two, three, or five 
equal arcs by ruler and compass (see Ayoub (1984)). This raised a ques- 
tion: for which n can the lemniscate be divided into n equal parts by ruler 
and compass? Recall from Section 2.3 that the corresponding question for 
the circle had been answered by Gauss (1801), Art. 366. As mentioned in 
there, the answer is n = 2p ,p2--- px, where the p; are distinct primes of 
the form 2?’ + 1. In the introduction to his theory (Art. 355), Gauss claims: 


The principles of the theory which we are going to explain 
actually extend much further than we will indicate. For they 
can be applied not only to circular functions but just as well 
to other transcendental functions, e.g. to those which depend 


on the integral fa/ V1 — x*) dx. 


However, his surviving papers do not include any result on the lemnis- 
cate as incisive as his result on the circle. There is only a diary entry of 
March 21, 1797, stating divisibility of the lemniscate into five equal parts. 

The answer to the problem of dividing the lemniscate into n equal 
parts was found by Abel (1827), transforming Gauss’s obscurity into crys- 
tal clarity: division by ruler and compass is possible for precisely the same 
n as for the circle. This wonderful result serves, perhaps better than any 
other, to underline the unifying role of elliptic functions in geometry, alge- 
bra, and number theory. A modern proof of it may be found in Rosen 
(1981). 


EXERCISES 


The following exercises show how the lemniscatic sine and its derivative are 
quite analogous to the ordinary sine and its derivative, the cosine. 


10.8.1 Show that si’(u) = 1 —- si*(u). 
10.8.2 Deduce from the Euler addition theorem (Section 10.7) that 


sl(u)sl’(v) + si(v)sl’(u) 


sl(u + v) = 1+ s2(u)s2(v) 


® 


Check for 
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Complex Numbers and 
Curves 


PREVIEW 

This chapter revisits polynomial equations and algebraic curves, observing 
how these topics are simplified by introducing complex numbers. That’s 
right: the so-called “complex” numbers actually make things simpler. 

One of the reasons for the simplifying power of complex numbers is 
their two-dimensional nature. The extra dimension gives more room for 
solutions of equations to exist. For example, the equation x” = 1, which 
has only one or two solutions in the real numbers, has n different solutions 
in the complex numbers, equally spaced around the unit circle. 

In fact, any equation of degree n has n complex solutions, when solu- 
tions are properly counted. This is the fundamental theorem of algebra, 
and it follows from intuitively simple properties of the plane and continu- 
ous functions. 

The fundamental theorem also enables us to get the “right” number 
of intersections between a curve of degree m and a curve of degree n. 
However, it is not enough to introduce complex coordinates: getting the 
right count of intersections also requires us to adjust our viewpoint in two 
other ways: by counting intersections according to their multiplicity, and 
by counting points at infinity. 

For these reasons, and others, algebraic geometry moved to the setting 
of complex projective space in the 19th century. In this chapter we see how 
this affects our view of algebraic curves: in short, they become surfaces. 
© Springer Nature Switzerland AG 2020 181 
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11.1 Impossible Numbers 


In previous chapters it has often been claimed that certain mysteries— 
de Moivre’s formula for sinn@ (Section 5.6), factorization of polynomials 
(Section 5.7), classification of cubic curves (Section 7.4), and the behavior 
of elliptic functions (Section 10.8)—are cleared up by the introduction of 
complex numbers. That complex numbers do all this and more is one of 
the miracles of mathematics. At the beginning of their history, complex 
numbers a+b V—1 were considered to be “impossible numbers,” tolerated 
only because they seemed useful for solving cubic equations. But their 
significance turned out to be geometric and ultimately led to the unification 
of algebra with an enriched domain of geometry, including topology and 
another “impossible” field, non-Euclidean geometry. 

In this chapter we will see how complex numbers emerged from the 
theory of equations and enabled its fundamental theorem to be proved—at 
which point it became clear that complex numbers had meaning far beyond 
algebra. Their impact on curves and function theory is described later in 
this chapter and in the next. Non-Euclidean geometry had entirely different 
origins but arrived at the same place as complex function theory in the 
1880s, thanks to complex numbers. This unexpected meeting is described 
in Chapter 13. 


Quadratic Equations 


In theory, mathematics first calls on complex numbers to solve certain 
quadratic equations, such as the equation x” + 1 = 0. However, this did not 
happen when quadratic equations first appeared, since at that time there 
was no need for all quadratic equations to have solutions. Many quadratic 
equations are implicit in Greek geometry, but one does not demand that 
every geometric problem have a solution. If one asks whether a particular 
circle and line intersect, say, then the answer can be yes or no. If yes, 
the quadratic equation for the intersection has a solution; if no, it has no 
solution. An “imaginary solution” is uncalled for in this context. 

Even when quadratic equations appeared in pure algebra, with Dio- 
phantus and the Arab mathematicians, there was initially no reason for 
complex solutions. One only wanted to know whether there were real solu- 
tions, and if not the answer was simply—no solution. This is the appro- 
priate answer when quadratics are solved by geometrically completing the 
square (Section 5.3), as was done up to the time of Cardano. A square of 
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negative area did not exist in geometry. The story might have been dif- 
ferent had mathematicians used symbols more and dared to consider the 
symbol V-—1 as an object in its own right, but this did not happen until 
quadratics had been overtaken by cubics, at which stage complex numbers 
became unavoidable, as we will now see. 


11.2 Cubic Equations 


The del Ferro—Tartaglia—Cardano solution of the cubic equation 


y= pytq 


v= 5+ VE) -(3) + ¥5- vG) -G) 


as we saw in Section 5.5. We notice that it involves complex numbers 
when (q/2)? — (p/3)> < 0. However, one cannot dismiss this as a case 
with no solution, because a cubic always has at least one real root (since 
y° — py—q is positive for large positive y and negative for large negative y). 
Thus the Cardano formula raises the problem of reconciling a real value, 
found by inspection, say, with an expression of the form 


VaebN-1eaa=bVal 


Cardano did not face up to this problem in his Ars magna (1545). He 
did, it is true, once mention complex numbers, but in connection with a 
quadratic equation and accompanied by the comment that these numbers 
were “as subtle as they are useless” (Cardano (1545), Ch. 37, Rule II). 

The first to take complex numbers seriously and use them to achieve 
the necessary reconciliation was Bombelli (1572). Bombelli worked out 
the formal algebra of complex numbers, with the particular aim of reduc- 


ing expressions va + bvV-1 to the form c + d V—1. His method enabled 
him to show the reality of some expressions resulting from Cardano’s for- 
mula. For example, the solution of 


xy = 15244 


x= 2411 V-14 42-11V-1 
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according to the formula. On the other hand, inspection gives the solution 
x = 4. Bombelli had the hunch that the two parts of x in the Cardano 
formula were of the form 2 + n V—1 and 2 — n V—1. He found, by cubing 
these expressions formally [using (V—1)? = —-1, and n = 1], that indeed 


24411 VA =2VS1, 
2-11 V¥-1 =2- V-1, 


hence the Cardano formula also gives their sum x = 4. 

Figure 11.1 is a facsimile of the manuscript page on which Bombelli 
stated his result: Somma 4. The figure is from a 1569 version of Bombelli’s 
L’Algebra: page 72 verso in codice B. 1569, which is in the Biblioteca 
dell’ Archiginnasio in Bologna, and is used with their permission. It was 
transcribed from Bombelli’s lectures by F. M. Salando. 
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Figure 11.1: Bombelli’s manuscript 
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He has placed the problem and its solution inside a decorative border. 
It has the equation x° = 15x + 4 at the top (in his notation, which does 
not show the variable x—only its coefficients and, directly above them, 
its exponents), and the conclusion (2 + v-1) +(2- Ver) = 4 at the 
bottom. Bombelli includes the trivial calculations of 5 x 5 x 5 = 125 and 
2 x2 = 4, needed for the Cardano formula. But he does not include the 
crucial calculation of (2 + v-1)3 needed to remove the cube roots—he 
simply removes them without explanation! 

It is not hard to pick out the preceding expressions when one allows 
for the notation and the fact that 11 V—1 is written as VO — 121. Note in 
particular the sign R for “root,” which today is still in use by pharmacists 
(presumably because of the roots once common for medical purposes). 

Much later, Hélder (1896) showed that any algebraic formula for the 
solution of the cubic must involve square roots of quantities that become 
negative for particular values of the coefficients. A proof of Hdlder’s result 
may be found in van der Waerden (1949), p. 180. 


EXERCISES 
11.2.1 Check that (2+ V—1? =2+4 11 V-1. 


It is possible to work backwards and concoct a cubic equation with an obvious 
solution that can be reconciled with the hideous solution in the Cardano formula. 
Here is an example. 


11.2.2 Check that (3 + V—1)? = 18 +26 V-1. 
11.2.3 Hence explain why 


6=(3+ V-1)+B- Vat) = 18 +26 Va + V18 — 26 Val. 


11.2.4 Find p and qg such that 


is=4 and 26V¥-1= (4) -(2). 


11.2.5 Check that 6 is a solution of the equation x° = px + q for the values of p 
and qg found in Exercise 11.2.4. 


11.3. Angle Division 


In Section 5.6 we saw how Vieéte related angle trisection to the solution of 
cubic equations, and how Leibniz (1675) and de Moivre (1707) solved the 
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angle n-section equation by the Cardano-type formula 


ln ln 
=-y/y+ Jy2-1l+=afy- Vy-1. 1 
sd es y av y (1) 


We also saw how this and Viéte’s formulas for cosn@ and sinn@ could 
easily be explained by the formula 


(cos 6 + isin 6)" = cosné + isinné (2) 


usually associated with de Moivre. Actually, de Moivre never stated (2) 
explicitly. The closest he came was to give a formula for (cos 6 + isin 6)!/” 
in de Moivre (1730). (See Smith (1959) for a series of extracts from the 
work of de Moivre on angle division). It seems that the clues in the algebra 
of circular functions were not strong enough to reveal (2) until a deeper 
reason for it had been brought to light by calculus. 

Complex numbers made their entry into the theory of circular func- 
tions in a paper on integration by Johann Bernoulli (1702). Observing that 
v—1 =i makes possible the partial fraction decomposition 


1 7 1/2 " 1/2 
l+2 l+a 1-z’ 


Bernoulli saw that integration would give an expression for tan™! z as an 
imaginary logarithm, though he did not write down the expression in ques- 
tion and was evidently puzzled as to what it could mean. In Section 12.1 
we will see how Euler clarified Johann Bernoulli’s discovery and devel- 
oped it into the beautiful theory of complex logarithms and exponentials. 
What is relevant here is that Johann Bernoulli (1712) took up the idea 
again, and this time he carried out the integration to obtain an algebraic 
relation between tan v6 and tan @. His argument is as follows. Given 


y = tannd, x = tang, 


we have 


no = tan! y =ntan"! x; 


hence, taking differentials gives 


dy _ ndx 
l+y? 14x? 
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or 


Integration gives 
log(y + i) — log(y — i) = nlog(x + 1) — nlog(x — i), 


that is, 
log 


y+i x+i\ 
; =lo ( ) , 
y-i x-i 
whence 

(x-)"y ti) =+)"Y - 9). (3) 
This formula was the first of the de Moivre type actually to use i explic- 


itly and the first example of a phenomenon later articulated by Hadamard 
(1954), Chapter VIII: 


the shortest and best way between two truths in the real domain 
often passes through the imaginary one. 


Solving (3) for y as a function of x expresses tan as a rational function 
of tan 8, which is difficult to obtain using real formulas alone. In fact, it is 
easy to show from (3) that y is the quotient of the polynomials consisting 
of alternate terms in (x + 1)”, provided with alternate + and — signs (see 
exercises). 

18th-century mathematicians had mixed feelings about V—I. They 
were willing to use it en route to results about real numbers but doubted 
that it had a concrete meaning of its own. Cotes (1714) even used a+ v—Ib 
to represent the point (a,b) in the plane (as Euler did later), apparently 
without noticing that (a, b) was a valid interpretation of a + v-1b. Since 
results about V—1 were suspect, they were often left unstated when it was 
possible to state an equivalent result about reals. This may explain why 
de Moivre stated (1) but not (2). Another example of the avoidance of 
results about V—1 is the remarkable theorem on the regular n-gon discov- 
ered by Cotes in 1716 and published posthumously in Cotes (1722): 


If Ao, ...,An—1 are equally spaced points on the unit circle with center 
O, and if P is a point on OAog such that OP = x, then (Figure 11.2) 


PAo : PA, ses PAy-1 =l1- x", 
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A2 
A\ 
Ao 
An-1 
An-2 


Figure 11.2: Cotes’s theorem 


This theorem not only relates the regular n-gon to the polynomial x”—1 
but in fact geometrically realizes the factorization of x" —1 into real linear 
and quadratic factors. By symmetry one has PA; = PA,_1,..., SO 


PA: PA}: PAZ--- PAZ yp n odd, 


PAop- PA, --- PAn-1 = PAg: PA}: PAS--+PA 


2 
n/2-1 PAjj2 neven. 


PAo = 1— xis areal linear factor, as is PA;/2 when n is even, and it follows 
from the cosine rule in triangle OPA, that 


2k. 
PA; = 1-2xcos = +x. 
n 


The easiest route from here to the theorem is by splitting PA? into complex 
linear factors and using de Moivre’s theorem. We can only speculate that 
this was Cotes’s method, since he stated his theorem without proof. The 
theorem has a second half which similarly decomposes 1 + x” into real 
linear and quadratic factors. These factorizations were needed to integrate 
1/(1 + x") by resolution into partial fractions, which was Cotes’s main 
objective. Such problems were then high on the mathematical agenda, and 
they motivated research into the factorization of polynomials, in particular 
the first attempts to prove the fundamental theorem of algebra. 
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EXERCISES 


Johann Bernoulli’s formula relating y = tann@ to x = tan@ is false for some 
values of n, because it neglects a possible constant of integration. The result of 
integration should be 


log(y + i) — log(y — i) = nlog(x + 1) — nlog(x -— i) +C, 
for some C, leading to 


yt+i | (x+ i)" 


y-i (x- 3” @) 


for some constant D (equal to e©). Sometimes D = 1 gives the correct formula, 
but sometimes we need D = —1. 


11.3.1 Show that D = | gives the correct formula when n = 1. 
11.3.2. Using formulas for sin 26 and cos 26, or otherwise, show that 


2 tan 6 
1 —tan?@’ 


and check that this follows from (*) for D = —1, but not for D = 1. 


tan 29 = 


11.3.3 Use the formula in Exercise 11.3.2 to express tan 46 in terms of tan 26, and 
hence in terms of tan 0. 


11.3.4 Letting y = tan 40 and x = tan, express the result of Exercise 11.3.3 as 


_ 4x — 4x3 
o> 6x41’ 


and check that this follows from (*) when D = -1. 


11.4 The Fundamental Theorem of Algebra 


The fundamental theorem of algebra is the statement that every polynomial 
equation p(z) = O has a solution in the complex numbers. As Descartes 
observed (Section 5.7), a solution z = a implies that p(z) has a factor z— a. 
The quotient g(z) = p(z)/(z — a) is then a polynomial of lower degree; 
hence if every polynomial equation has a solution, we can also extract a 
factor from q(z), and if p(z) has degree n, we can go on to factorize p(z) 
into n linear factors. The existence of such a factorization is of course 
another way to state the fundamental theorem. 

Initially, interest was confined to polynomials p(z) with real coeffi- 
cients, and in this case d’ Alembert (1746) observed that if z = u+ivisa 
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solution of p(z) = 0, then so is its conjugate z = u—iv. Thus the imaginary 
linear factors of a real p(z) can always be combined in pairs to form real 
quadratic factors: 


(z -—u-—iv)(z —u +t iv) = 2 —2uzt+(u> +0”). 
This gave another equivalent of the fundamental theorem: each (real) poly- 
nomial p(z) can be expressed as a product of real linear and quadratic fac- 
tors. The theorem was usually stated in this way during the 18th century, 
when its main purpose was to make possible the integration of rational 
functions (see previous section). This also avoided mention of Ned 

It has often been said that attempts to prove the fundamental theorem 
began with d’Alembert (1746), and that the first satisfactory proof was 
given by Gauss (1799). This opinion should not be accepted without ques- 
tion, since the source of it is Gauss himself. Gauss (1799) gave a critique of 
proofs from d’ Alembert on, showing that they all had serious weaknesses, 
then offered a proof of his own. He wanted to convince readers that the new 
proof was the first valid one, even though it used one unproved assumption 
(which is discussed further in the next section). The opinion as to which of 
two incomplete proofs is more convincing can of course change with time, 
and I believe that Gauss (1799) might be judged differently today. We can 
now fill the gaps in d’ Alembert (1746) by appeal to standard methods and 
theorems, whereas there is still no easy way to fill the gap in Gauss (1799). 
This was first done by Ostrowski (1920). 

Both proofs depend on the geometric properties of the complex num- 
bers and the concept of continuity for their completion. The basic geo- 
metrical insight—that the complex number x + iy can be identified with 
the point (x, y) in the plane—mysteriously eluded all mathematicians until 
the end of the 18th century. This was one of the reasons that d’ Alembert’s 
proof was unclear, and the use of this insight by Argand (1806) was an 
important step in d’ Alembert’s reinstatement. Gauss seems to have had the 
same insight but concealed its role in his proof, perhaps believing that his 
contemporaries were not ready to view the complex numbers as a plane. 

As for the concept of continuity, neither Gauss nor d’ Alembert under- 
stood it very well. Gauss (1799) seriously understated the difficulties 
involved in the unproved step, claiming that “no one, to my knowledge, 
has ever doubted it. But if anybody desires it, then on another occasion 
I intend to give a demonstration which will leave no doubt” (translation 
from Struik (1969), p. 121). Perhaps seeing the difficulty on further reflec- 
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tion, he gave a second proof, Gauss (1816), in which the role of continuity 
was minimized. The second proof is purely algebraic except for the use 
of a special case of the intermediate value theorem. Gauss assumed that a 
polynomial function p(x) of a real variable x takes all values between p(a) 
and p(b) as x runs from a to b (which implies that a polynomial of odd 
degree takes the value 0). 

The first to appreciate the importance of continuity for the fundamen- 
tal theorem of algebra was Bolzano (1817), who proved the continuity of 
polynomial functions and attempted a proof of the intermediate value the- 
orem. The latter proof was unsatisfactory because Bolzano had no clear 
concept of real number on which to base it, but it did point in the right 
direction. When a definition of real numbers emerged in the 1870s (for 
example, with Dedekind cuts; Section 4.2), Weierstrass (1874) rigorously 
established the basic properties of continuous functions, such as the inter- 
mediate value theorem and extreme value theorem. This completed not 
only the second proof of Gauss but also the proof of d’ Alembert, as we 
will see in the next subsection. 


The Idea of d’Alembert 


The key to d’ Alembert’s proof is a proposition now known as d’Alembert’s 
lemma: if p(z) is a nonconstant polynomial function and p(zo) # O, then 
any neighborhood of zp contains a point z, such that |p(z1)| < |p(Zo). 

The proof of this lemma offered by d’Alembert depended on solving 
the equation w = p(z) for z as a fractional power series in w. As mentioned 
in Section 9.4, such a solution was claimed by Newton (1671), but it was 
made clear and rigorous only by Puiseux (1850). Thus d’ Alembert’s argu- 
ment did not stand on solid ground, and in any case it was unnecessarily 
complicated. 

A simple elementary proof of d’Alembert’s lemma was given by 
Argand (1806). Argand was one of the co-discoverers of the geometric 
representation of complex numbers (probably the first was Wessel (1797), 
but his work remained almost unknown for 100 years), and he offered the 
following proof as an illustration of the effectiveness of the representation. 

The value of p(zo) = xo + iyo is interpreted as the point (xo, yo) in the 
plane, so that |p(zo)| is the distance of (xo, yo) from the origin. We wish to 
find a Az such that p(zo + Az) is nearer to the origin than p(zo). If 


p(2) = agz" + ayz |) +++ +n, 
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then 


P(Z0 + Az) = ao(zo + Az)" + ai(Zo + Az)" | +++ + ay 
= ayth + azn! +++ + dy + Aj Az + Ap(Azy? +--+ +A, (Az)” 
for some constants A; depending on Zo, not all zero, 
because p is not constant 


= p(Zo) + AAz + 6, 


where A = A,(Az)' contains the first nonzero A; and |e| is small compared 
with |AAz| when |Az| is small (because € contains higher powers of Az). 
It is then clear (Figure 11.3) that by choosing the direction of Az so that 
AAz is opposite in direction to p(zo), we get |p(zo + Az)| < |p(Zo)|. This 
completes the proof of d’Alembert’s lemma. 


P(Z0) Plzo) + € 


P(zo) +E + AAZ = pl(zZo + Az) 


O 


Figure 11.3: Construction for d’ Alembert’s lemma 


To complete the proof of the fundamental theorem of algebra, take an 
arbitrary polynomial p and consider the continuous function |p(z)|. Since 
P(Z) © aoz” for |z| large, |p(z)| increases with |z| outside a sufficiently large 
circle |z| = R. We now get a z for which |p(z)| = 0 from the extreme value 
theorem of Weierstrass (1874); a continuous function on a closed bounded 
set assumes maximum and minimum values. By this theorem, |p(z)| takes 
a minimum value for |z| < R. The minimum is > 0 by definition, and if it 
is > O we get a contradiction by d’ Alembert’s lemma: either a point z with 
|z| < R where |p(z)| takes a value less than its minimum or a point z with 
|z| > R where |p(z)| is less than its values on |z| = R. Thus there is a point z 
where |p(z)| = 0 and hence p(z) = 0. 
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From our present perspective, d’Alembert’s route to the fundamental 
theorem of algebra seems basically easy because it proceeds through gen- 
eral properties of continuous functions. The route of Gauss seems equally 
easy from a distance, but it goes through the still-unfamiliar territory of 
real algebraic curves. The intersections of real algebraic curves are harder 
to understand than the intersections of complex algebraic curves, and in 
retrospect they are harder to understand than the fundamental theorem of 
algebra. Indeed, as we will see in the next section, the fundamental theo- 
rem gives us Bézout’s theorem, which in turn settles the problem of count- 
ing the intersections of complex algebraic curves. 


EXERCISES 


Complex roots of an equation with real coefficients occur in conjugate pairs 
because of the fundamental properties of conjugates. 


11.4.1 Show directly from the definition u + iv = u — iv that 
atoa=utm and BZ-n=1-°2 
for any complex numbers 2, Zo. 


11.4.2 Deduce from Exercise 11.4.1 that p(Z) = p(z) for any polynomial p(z) with 
real coefficients, and hence that the complex roots of p(z) = 0 occur in 
conjugate pairs. 


The expression in d’ Alembert’s lemma for p(zo+Az) is an instance of Taylor’s 
series, previously discussed in Section 9.3. When the function is a polynomial 
Pp, as here, its Taylor series is finite because p has only finitely many nonzero 
derivatives. 

11.4.3 Show that A, = nagz, | +(n- Dai? + +++ + d,_; and that the latter 
expression is p’(zo). 


11.4.4 Show that A, = Me agg? + OO Da zs +++++,_2 and that the latter 


expression is p’’(z)/2. 


11.4.5 Using the binomial theorem, show that A, = p“(zo)/k!, and hence that 
P(Zo + Az) = any tage +--+ +a, + AjAz PAG Azy tise AAzy 


is an instance of the Taylor series formula. 


11.5 Roots and Intersections 


There is a close connection between intersections of algebraic curves and 
roots of polynomial equations, going back as far as Menaechmus’s con- 
struction of V2 (a root of the equation x? = 2) by intersecting a parabola 
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and a hyperbola (Section 2.4). The most direct connection, of course, 
occurs in the case of a polynomial curve 


y = p(x) (1) 


whose intersections with the axis y = 0 are just the real roots of the equa- 
tion 


p(x) = 0. (2) 


If (2) has k real roots, then the curve (1) has k intersections with the axis 
y = 0. Here we must count intersections the same way we count roots, 
according to multiplicity. A root r of (2) has multiplicity w if the factor 
(x — r) occurs yt times in p(x), and the root r is then counted py times. 

This way of counting is also geometrically natural because if, for exam- 
ple, the curve y = p(x) meets the axis y = 0 with multiplicity 2 at 0, then a 
line y = ex close to the axis meets the curve twice—once near the intersec- 
tion with the axis and once precisely there. The intersection of y = x? with 
y = O (Figure 11.4) can therefore be considered as two coincident points 
to which the distinct intersections with y = ex tend as e > 0. Likewise, an 
intersection of multiplicity 3 can be explained as the limit of three distinct 
intersections, for example, of y = ex with y = x° (Figure 11.5) 


Y = Ex 
y=0 
Figure 11.4: Intersection of multiplicity 2 
Yy = Ex 
y=0 


Figure 11.5: Intersection of multiplicity 3 
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At first glance this idea seems to break down with multiplicity 4, since 
y = ex meets y = x* at only two points, x = 0 and x = Ve. The explanation 
is that there are also two complex roots in this case (We times the two 
complex cube roots of 1), hence we cannot neglect complex roots if we 
want to get the geometrically correct number of intersections. 

The fundamental theorem of algebra (previous section) gives us 1 roots 
of an nth-degree equation (2) and hence n intersections of the polynomial 
curve (1) with the axis y = 0. To get 1 roots, however, we have to admit 
complex values of x, so we have to consider “curves” for which x and y 
are complex in order to obtain n intersections. This, and other tidy conse- 
quences of the fundamental theorem of algebra (for example, the “coinci- 
dent point” interpretation of multiplicity; see Exercise 11.5.1), persuaded 
18th-century mathematicians to admit complex numbers into the theory of 
curves before complex numbers themselves were understood—and even 
before the fundamental theorem of algebra was proved. 

The most elegant consequence was Bézout’s theorem that a curve C,, 
of degree m meets a curve C,, of degree n at mn points. As we saw in 
Section 7.7, if homogeneous coordinates are used to take account of points 
at infinity, then the intersections of C,, and C, correspond to the solutions 
of an equation rn_(x, y) = 0, which is homogeneous of degree mn. We can 
now use the fundamental theorem of algebra to show that Mny(x, y) is the 
product of mn linear factors as follows: 


Xx 
Fin Xs y) — "Tin (2, | 


P 
x 
=y (0 - a) for some p < mn 
y 
=1 


L 


by the fundamental theorem, since r,,(x/y, 1) is a polynomial of degree 
Pp < mn in the single variable x/y. But then 


P 
rnnlxsy) = y""? | |(bix — aiy) 


=) 
= Fon — aiy) 
4 


since each factor y in front (if any) is trivially of the form b,x — ajy. 


It follows that the equation r,,,(x, y) = 0 has mn solutions, and hence 
there are mn intersections of C,,, and C,,, counting multiplicities. 
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EXERCISES 


11.5.1 Show that y = ex meets y = x" inn distinct points when ¢ # 0, and list 
them (for example, with the help of de Moivre’s theorem). 


If a curve K has a double point at O, then a line y = tx may have double 
contact with K at O even though nearby lines y = (f+ €)x do not meet K at nearby 
points other than O. In this case the double contact may be explained as contact 
with the two branches of the curve at O. 


11.5.2 Consider the lines y = tx through the double point O of y? = x?(x + 1). 
Show that each such line has double contact with the curve at O, except 
when f = +1. How do you account for the multiplicities when t = +1? 


11.5.3 Show that y = tx also has double contact with y” = x° at its cusp point O. 
Try to explain this by viewing y” = x° as the result of shrinking the loop of 
y? = x(x +6) (letting ¢ > 0). 


11.5.4 Show that the line y = tx has double contact at O with the lemniscate 
(x? + y’)? = x? — y* except for two values of f, for which it has quadruple 
contact. 


11.5.5 Explain the multiplicities found in Exercise 11.5.4 with the help of the 
known shape of the lemniscate (Figure 10.4). 


11.6 The Complex Projective Line 


We saw in Section 7.5 that adding a point at infinity to the real line R in 
R XR forms a closed curve that is qualitatively like a circle. Indeed, a real 
projective line in the sphere model of the real projective plane RP” has 
much the same geometric properties as a great circle on a sphere, after one 
allows for the fact that antipodal points on the sphere are the same point on 
RP. The situation with the complex “line” C is similar but more difficult 
to visualize. C is already two-dimensional, as we saw in Gauss’s proof of 
the fundamental theorem of algebra; hence the complex “plane” C x C is 
four-dimensional and virtually impossible to visualize. 

To avoid an excursion into four-dimensional space, we first revise our 
approach to the real projective line. In Section 7.5 we considered ordinary 
lines L, in a horizontal plane not passing through the origin, and extended 
each to a projective line whose “points” are the lines through the origin 
O, in the plane through O and L. The nonhorizontal lines in this family 
correspond to points of L, and the horizontal line in the family to the point 
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at infinity of L. We now use this construction again to demonstrate directly 
the qualitative, or more precisely topological, equivalence between a pro- 
jective line and a circle (Figure 11.6). 


N 


Figure 11.6: The real projective line 


The origin N is taken to be the top point of a circle that, at its bottom 
point, touches our line L = R. There is a continuous one-to-one correspon- 
dence between lines through N and points of the circle. Each nonhorizon- 
tal line corresponds to its intersection x’ # N with the circle, while the 
horizontal line corresponds to N itself. Thus the projective completion of 
R, which we now call RP", is topologically the same as the circle, in the 
sense that there is a continuous one-to-one correspondence between them. 
Moreover, we can understand projective completion of R topologically as 
a process of adding one “point” that is “approached” as one tends to infin- 
ity, in either direction, along R, for as x tends to infinity in either direction, 
x’ tends to the same point, N, on the circle. 

We can now view projective completion of C in the same way using 
Figure 11.7, which shows what is called stereographic projection of the 
plane C into a sphere. Each point z € C is projected to a point z’ on the 
tangential sphere S by the ray through z and the north pole N of S. This 
establishes a continuous one-to-one correspondence between points z of C 
and points z’ # N on S. Moreover, as z tends to infinity in any direction, 
z’ tends to N; hence the projective completion of CP! of C is topologically 
the same as the complete sphere S$ , with the point at co of C corresponding 
to N. 

Since one also wants to complete C by a point oo in this way for com- 
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Figure 11.7: The complex projective line 


plex analysis, geometry and analysis are both served by passing from C 
to CP'. Gauss seems to have been the first to appreciate the advantages 
of C U {co} over C; hence one often calls CP! the Gauss sphere in analy- 
sis. (Unfortunately, only a few unpublished, undated fragments of Gauss’s 
work on this topic seem to have survived; see Gauss (1819).) Algebraic 
geometers call CP! the (complex) projective line, since it is the formal 
equivalent of a real line, even though it is topologically a surface. Simi- 
larly, complex curves are topologically surfaces, known to analysts as Rie- 
mann surfaces, though algebraic geometers prefer to call them “curves.” 


The “surface” viewpoint is helpful when studying intrinsic properties 
of complex curves. For example, genus (introduced in connection with 
parameterization in Sections 10.2 and 10.3) turns out to have a very simple 
meaning in the topology of surfaces (see Section 15.3). On the other hand, 
the “curve” viewpoint is helpful when studying intersections of curves and 
their embedding in CxC or its projective completion CP”. Instead of trying 
to imagine two planes meeting in a single point of C x C, for example, it 
is better to imagine the intersection as analogous to that of real lines in a 
real plane—as the single solution of two linear equations. After all, we are 
working with C to remove anomalies that occur with R, not for the sake 
of doing something different, and we expect that much of the behavior of 
real curves will recur with complex ones. 


11.6 
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EXERCISES 


Since addition and multiplication are continuous functions, it is quite easy to 
find one-to-one continuous maps between certain complex algebraic curves and 
the sphere. 


11.6.1 


11.6.2 


11.6.3 


Show that the projective completion of the curve Y = X? is topologically a 
sphere by considering its parameterization 


X=t, Y=?P, 


where ¢ ranges over the sphere C U {co}. Namely, show that the mapping 
t + (t, 17) is one-to-one and continuous. 


Similarly show that the projective completion of Y* = X? is topologically 
a sphere by considering its parameterization 


X=f, Y=r 


and the continuous mapping f + (t7, f°). 


Consider the mapping of the ¢ sphere onto the projective completion of 
Y? = X?(X + 1) defined by t + P(), where P() is the third intersection of 
the curve with the line Y = tX through the double point (found in Exercise 
6.4.2). 


Show that this mapping is continuous and that it is one-to-one except at 
the points ¢ = +1, which are both mapped to the point O on the curve. 
Conclude that the curve is topologically the same as a sphere with two 
points identified (Figure 11.8). 


Figure 11.8: A singular sphere 
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11.7. Branch Points 


The key to the topological form of a complex curve p(x, y) = 0 lies in 
its branch points, the points a where the Newton—Puiseux expansion of y 
begins with a fractional power of (x — a) (see Section 9.4). The nature of 
branch points was first described by Riemann (1851) as part of a revolu- 
tionary new geometric theory of complex functions. Riemann’s idea, one 
of the most illuminating in the history of mathematics, was to represent 
a relation p(x, y) = 0 between complex x and complex y by covering a 
plane (or sphere) representing the x variable by a surface representing the 
y variable, the point or points of the y surface over a given point x = @ 
being those values of y that satisfy p(a, y) = 0. 

If the equation p(a,y) = 0 is of degree n in y, there will in general 
be n distinct y values for a given a, consequently n sheets of the y surface 
lying over the x-plane in the neighborhood of x = a. At finitely many 
exceptional values of x, sheets merge due to concidence of roots, and the 
Newton—Puiseux theory says that at such a point y behaves like a fractional 
power of x at 0. Our main problem, therefore, is to understand the behavior 
of the Riemann surface for y = x” in the neighborhood of 0. 

The idea can be grasped sufficiently well from seeing the special case 
y = x'/?. If we consider the unit disk in the y-plane and try to deform it 
so that the points y = +x lie above the point x in the unit disk of the 
x-plane, then the result is something like Figure 11.9. 

The angles @ on the disk boundaries are the arguments of the corre- 
sponding points e” = cos 6 + isin @, as we explain in Section 12.1. If 


x= elf = ei(O+2n) 


then 
y = ef! eil9/2+0) 
giving the values shown. 

It should be noted that the awkward appearance of the branch point, in 
particular the line of self-intersection, is a consequence of representing the 
relation y”? = x in fewer dimensions than the four it really requires. If we 
similarly attempt to represent the relation y? = x between real x and y by 
laying the y-axis along the x-axis so that y = + x are on top of x, then the 
result is an awkward folded “branch point” at 0 (Figure 11.10). This is a 
consequence of trying to represent the relation in one dimension. In reality, 
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n/2 


37/2 


Figure 11.9: Branch point for the square root 


as the second part of the figure shows, when viewed as a curve in the plane 
the relation is just as smooth at 0 as anywhere else. (Notice, incidentally, 
that the folded line in Figure 11.10, the real y-axis, corresponds to the 
self-intersection line in Figure 11.9.) 


11.8 Topology of Complex Projective Curves 


To understand the complete structure of the complex projective curve 
defined by y” = x we need to know its behavior at infinity. At oo there 
is another branch point like the one at 0 (just replace x by 1/u and y by 1/v 
and notice that we are looking at v* = u near y = 0, v = O—the same situa- 
tion as before). The topological nature of the relation between x and y can 
then be captured by the model seen in Figure 11.11. The sphere of x values 
is covered by two spheres (like skins of an onion), slit along a line from 
0 to co and cross-joined by pasting the red edges together and the purple 
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ous Vx + ee 
Figure 11.10: Branch point in one dimension and two 


edges together. The slit from 0 to co is arbitrary, but the cross-joining is 
needed to produce the branch point structure at 0 and oo. 


Figure 11.11: Covering the sphere 


The covering of the x sphere by this two-sheeted surface expresses 
the covering projection map (x, y) > x from a general point on the curve 
y’ = x to its x coordinate and shows that it is two-to-one except at the 
branch points 0, co. The two-sheeted surface itself captures the intrinsic 
topological structure of the curve, and this structure can be more readily 
seen by separating the two skins from the x sphere and each other, then 
joining the required edges (Figure 11.12). Edges to be joined are given the 
same color, and we see that the resulting surface is topologically a sphere. 

This result could have been obtained more directly by projecting each 
point (x, y) on the curve to y, since this is a one-to-one continuous map 
between the curve and the y-axis, which we know to be topologically a 
sphere (when oo is included). The curve here was modeled by cutting and 
joining sheets on the sphere because this method extends to all algebraic 
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Figure 11.12: Joining the separated sheets 


curves. The Newton—Puiseux theory implies that any algebraic relation 

p(x,y) = 0 can be modeled by a finite-sheeted covering of the sphere, 

with finitely many branch points. The most general branch point structure 

is given by a prescription for cross-joining (permuting) the sheets, and by 

slitting the sheets between branch points (or, if necessary, to an auxiliary 

point) they can be rejoined to produce the prescribed branching behavior. 
The most interesting case of this method is the cubic curve 


y= x(x — a)(x — f). 


This relation defines a covering in the x sphere that is two-sheeted, since 
for each x there are + and — values for y, with branch points at 0, a, 6, 
and oo. (The branch point at co is explained in the exercises below.) Thus 
if we slit the sheets from 0 to a and from £ to ov, the required joining is by 
pasting like-colored edges, as shown in Figure 11.13. 


Figure 11.13: Joining the sheets of a cubic curve 


We find, as Riemann did, that the surface is a torus, and hence not 
topologically the same as a sphere. This discovery illuminated the theory 
of cubic curves and elliptic functions, as we will see in the next chapter. 
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One quickly sees that relations of the form 
y= (x- @1)(x — a2)++-(X— Aan) 


yield Riemann surfaces of all the forms shown in Figure 11.14. These sur- 
faces are distinguished topologically from each other by the number of 
“holes”: O for the sphere, 1 for the torus, and so on. This simple topolog- 
ical invariant turns out to be the genus, which also determines the type of 
functions that can parameterize the corresponding complex curve. Other 
geometric and analytic properties of genus will unfold over the next few 
chapters. The topological importance of genus was established by Mobius 
(1863), when he showed that any closed surface in ordinary space is topo- 
logically equivalent to a sphere or one of the forms seen in Figure 11.14. 
For more on genus, see Chapter 15. 


Figure 11.14: Riemann surfaces of genus 1, 2, 3, ... 


EXERCISES 


We can transfer the “one-dimensional branch point” (Figure 11.10) to infinity 
to see the topology of the real projective curve y” = x. 


11.8.1 Explain why the real projective curve y* = x has a branch point at infinity 
like the one at 0, and hence conclude that this curve is topologically a 
circle. 


We can explain the branch point at infinity of a cubic curve as follows. 
11.8.2 Use the substitution x = 1/u, y = 1/v to show that the curve 
y? = x(x- a)(x—B) 
behaves at infinity as the curve 
v =w(1—uay'(1- up)! 
does at 0, which in turn is qualitatively like the behavior of 
3/2 


v=Uu'.~ 


11.8.3 Show, by considering the points lying above u = e’, that v = u°/* has a 
branch point at 0 like that of v = u!/. 


® 


Check for 
updates 
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Complex Numbers and 
Functions 


PREVIEW 


The insight into algebraic curves afforded by complex coordinates—that a 
complex curve is topologically a surface—has important implications for 
functions defined as integrals of algebraic functions, such as the logarithm, 
exponential, and elliptic functions. 

The complex logarithm turns out to be many-valued, due to the differ- 
ent paths of integration in the complex plane between the same endpoints. 
It follows that its inverse function, the exponential function, is periodic. In 
fact, the complex exponential function is a fusion of the real exponential 
function with the sine and cosine: e** = e*(cos y + isin y). 

The double periodicity of elliptic functions also becomes clear from 
the complex viewpoint. The integrals that define them are taken over paths 
on a torus surface, on which there are two independent closed paths. 

The two-dimensional nature of complex numbers imposes interesting 
and useful constraints on the nature of differentiable complex functions. 
Such functions define conformal (angle-preserving) maps between sur- 
faces. Also, their real and imaginary parts satisfy equations, called the 
Cauchy—Riemann equations, that govern fluid flow. So complex functions 
can be used to study the motion of fluids. 

Finally, the Cauchy—Riemann equations imply Cauchy’s theorem. This 
fundamental theorem guarantees that differentiable complex functions have 
many good features, such as power series expansions. 
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12.1 Complex Functions 


When Bombelli (1572) introduced complex numbers, he implicitly intro- 
duced complex functions as well. The solution y of the cubic equation 


y> = py +q, 


ETO AE TC). 


involves the cube root of a complex argument when (q/2)? 2 (p37. 
It could have been a revelation to see that complex numbers explain the 
coincidence of algebraic (Cardano) and geometric (Viéte) solutions of the 
cubic equation, and more generally the Leibniz—de Moivre theorem that 


1 an ln 
=-/y+ Vyr-l+=afy- Vy’-1, 
x 5) y y 5) y y 


when x = sin@ and y = sinné (Section 5.6). In the case of the cubic, this 
revelation can now be savored in Needham (1997), pp. 59-60. But math- 
ematicians were not concerned about the meaning of these complex func- 
tions as long as they produced results that could be checked by 
algebra. 


The need to understand complex functions became pressing only with 
transcendental functions, particularly those defined by integration. A key 
example is the logarithm function, which comes from integrating 
dz/(1 + z). Once this function was understood, the reason for algebraic 
miracles like the Leibniz—de Moivre theorem became much clearer. 


Johann Bernoulli (1702) opened the story of the complex logarithm 
when he noted that 
dz dz dz 
+2 2042V-D) 20 -2V-D 
and drew the conclusion that “imaginary logarithms express real circu- 


lar sectors.” He did not actually perform the integration, but he may have 
found 


since Euler gives him credit for a similar formula when writing to him 
in Euler (1728b). However, this may have been the young Euler’s defer- 
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ence to his former teacher, because Johann Bernoulli showed poor under- 
standing of logarithms as the correspondence continued. He persistently 
claimed that log(—x) = log(x) on the grounds that 


d 1 a 
— log(—x) = — = —1 
ae og(—x) a og(x) 


despite a reminder from Euler (1728b) that equality of derivatives does 
not imply equality of integrals. Euler went on to suggest that the complex 
logarithm had infinitely many values. 

In the meantime, Cotes (1714) had also discovered a relation between 
complex logarithms and circular functions: 


log(cos x + isin x) = ix. 


Recognizing the importance of this result, he entitled his work Harmonia 
mensurarum (Harmony of measures). The “measures” in question were the 
logarithm and inverse tangent functions, which measure the hyperbola and 
the circle, respectively, via the integrals f dx/(1 + x) and f dx/(1+.x7).A 
wide class of integrals had been reduced to these two types, but it was not 
understood why two apparently unrelated “measures” should be required. 
Cotes’s result was the first (apart from the near-miss of Johann Bernoulli) 
to relate the two, showing that in the wider domain of complex functions 
the logarithm and inverse circular functions are essentially the same. 

The most compact statement of their relationship was reached around 
1740, when Euler shifted attention from the logarithm function to its inverse, 
the exponential function. The definitive formula 


e* =cosx +isinx 


was first published by Euler (1748a), who derived it by comparing series 
expansions of both sides. Euler’s formulation in terms of the single-valued 
function e’* gave a simple explanation of the many values of the loga- 
rithm (which Cotes had missed) as a consequence of the periodicity of cos 
and sin. A direct explanation, based on the definition of log as an inte- 
gral, became possible when Gauss (1811) clarified the meaning of com- 
plex integrals and pointed out their dependence on the path of integration 
(see Section 12.3). 
Euler’s formula also shows that 


(cos x +isin x)” = e”* = cosnx + isinnx 
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and hence gives a deeper explanation of the Leibniz—de Moivre formula. 
More generally, the addition theorems for cos and sin (Section 10.6) could 
be seen as consequences of the much simpler addition formula for the 
exponential function 


The imaginary function e’* was so much more coherent than its real con- 
stituents cos x and sin x that it was difficult to do without it, and Euler’s for- 
mula gave mathematicians a strong push toward the eventual acceptance 
of complex numbers. A more detailed account of the role of the logarithm 
and exponential functions in the development of complex numbers may be 
found in Cajori (1913). 


The Cauchy-Riemann Equations 


At almost the same time that Euler elucidated cosine and sine, d’ Alembert 
found many real functions occurring naturally in pairs as the real and imag- 
inary parts of complex functions. In hydrodynamics, d’ Alembert (1752) 
discovered that the equations 
OP 0Q 

es SS a, O d —_— + — = O 

Oy Ox = Ox Oy 
relate the velocity components P, Q in two-dimensional steady irrotational 
fluid flow. These equations come from the requirements that Odx + Pdy 
and P dx — Q dy be complete differentials, in which case another complete 
differential is 


QOdx + Pdy + i(Pdx—- Qdy) = (Q+iPi(ar+ “) = (Q + iPyd(x+ 4). 


D’ Alembert concluded that this means Q + iP is a function f of x+y/i, so 
that O = Re(f) and P = Im(f). 

To feel the force of this result, one has to forget the modern definition 
of function, under which u(x, y) + iv(x, y) is a function of x + iy for any 
functions u, v. In the 18th-century context, a “function” f(x + iy) of x + iy 
was calculable from x+iy by elementary operations; at worst, f(x+iy) was 
a power series in x + iy. This imposes a strong constraint on u, v, namely 
that 


Ou _ Ov Ou _ Ov 
Ox dy’ oy Ox” 
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These were just the equations d’ Alembert found in his hydrodynami- 
cal investigations, but they came to be named the Cauchy—Riemann equa- 
tions, because Cauchy and Riemann stressed their key role in the study of 
complex functions. The concept of complex function was solidified when 
Cauchy (1837) showed that a function f(z), where z = x + iy, merely had 
to be differentiable in order to be expressible as a power series in z. Thus it 
suffices to define a complex function f(z) as one that is differentiable with 
respect to z in order to guarantee that f is defined with 18th-century strict- 
ness. It follows, in particular, that the first derivative of f entails deriva- 
tives of all orders and that the values of f in any neighborhood determine 
its values everywhere. This rigidity in the notion of complex function is 
enough of a constraint to enable nontrivial properties to be proved, but at 
the same time it leaves enough flexibility—one might say “fluidity’”—to 
cover important general situations. 


EXERCISES 


Euler’s derivation of e’* = cos x + isin x is easy to explain using the power 
series 


y? 5 
i 
1! 2! 3! 
and 
; eB PP x! 
smnx = xX 31 a 7 


found in Section 8.5. 


12.1.1 Assuming that the series for e¥ is also valid for y = ix, show that 


7 ( ce ; ( ee ae 
e =(1-—+—-— +---}+ifx-—+2-a—+4:::]. 
2! 4! 6! ! ! 
12.1.2 Assuming it is valid to differentiate the sine series term by term, show that 


r £ af 


Ost a ee 


and hence that e’* = cos x + isin x. 
Another consequence of e = cosx + isin x is that i = cos A +isin a = it /2, 
which allows us to evaluate the outlandish number 7’. 


12.1.3 Show that i! has a real value (Euler (1746)). What is it? 


12.1.4 Using the fact that e”” = 1 for any integer n, give a formula for all values 
of i! (Euler (1746)). 
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12.2 Conformal Mapping 


Another important general situation clarified by complex functions is the 
problem of conformal mapping. Mapping a sphere (the earth’s surface) 
onto a plane is a practical problem that has attracted the attention of math- 
ematicians since ancient times. Before the 18th century, the most notable 
mathematical contributions to mapping were stereographic projection 
(Section 11.6), due to Ptolemy around 150 cz, and the Mercator projection 
used by G. Mercator in 1569 (this Mercator was Gerard, not the Nicholas 
who discovered the series for log(1 + x)). Both these projections were 
conformal, that is, angle-preserving, or what 18th-century mathematicians 
called “similar in the small.” This means that the image f(R) of any region 
R tends toward an exact scale map of R as the size of R tends to 0. Since 
“similarity in the large” is clearly impossible—for example, a great circle 
cannot be mapped to a closed curve that divides the plane into two equal 
parts—conformality is the best one can do to preserve the appearance of 
regions on the sphere. Preservation of angles was intentional in the Merca- 
tor projection, whose purpose was to assist navigation, and in the case of 
stereographic projection conformality was first noticed by Harriot around 
1590 (see Lohne (1979)). 

Figure 12.1 illustrates the conformality of stereographic projection in 
the case of spherical triangles. The sphere has been divided into trian- 
gles with angles 2/2, 7/3, 7/4, and every other triangle has been cut out to 
allow a light to shine from inside the sphere and to cast shadows on the 
plane. It can be seen that the shadow triangles indeed have the same angles 
as their counterparts on the sphere. (This example shows another feature 
of stereographic projection: it maps circles to circles.) 

Advances in the theory of conformal mapping were made by Lambert 
(1772), Euler (1777) (sphere onto plane), and Lagrange (1779) (general 
surface of revolution onto plane). All these authors used complex numbers, 
but Lagrange’s presentation is the clearest and most general. Using the 
method of d’ Alembert (1752), he combined a pair of differential equations 
in two real variables into a single equation in one complex variable and 
arrived at the result that any two conformal maps of a surface of revolution 
onto the (x, y)-plane are related via a complex function f(x + iy) mapping 
the plane onto itself. These results were crowned by the result of Gauss 
(1822) generalizing Lagrange’s theorem to conformal maps of an arbitrary 
surface onto the plane. 
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Figure 12.1: Example of stereographic projection 


Conversely, a complex function f(z) defines a map of the z plane into 
itself, and it is easy to see that this map is conformal. In fact, this is a 
consequence of the differentiability of f. To say that a nonzero limit 


lim f (Zo + 62) — F(Z) 
620 Oz 


exists is to say that the mapping of the disk {z : |z — zo| < |6z|} around zo to 
the region around f(zo) tends to a scale mapping as |6z| tends to 0. If the 
derivative is expressed in polar form as 


f'@)=re", 


then r is the scale factor of this limit mapping and a is the angle of rotation. 
Riemann (1851) seems to have been the first to take the conformal map- 
ping property as a basis for the theory of complex functions. His deepest 
result in this direction was the Riemann mapping theorem, which states 
that any region of the plane bounded by a simple closed curve can be 
mapped onto the unit disk conformally, and hence by a complex function. 
The proof of this theorem in Riemann (1851) depends on properties that 
Riemann justified partly by an appeal to physical intuition that he called 
Dirichlet’s principle. Such reasoning went against the growing tendency 
toward rigor in 19th-century analysis, and stricter proofs were given by 
Schwarz (1870) and Neumann (1870). However, Riemann’s faith in the 
physical roots of complex function theory was eventually justified when 
Hilbert (1900b) put Dirichlet’s principle on a sound basis. 
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EXERCISES 


The claim that differentiability of f(z) implies that f is a conformal mapping 
must be qualified by the condition f’(z) # 0, because if the scale factor tends to 0 
then f cannot be said to be a scale mapping. At points where f’(z) = 0 one may 
find that angles are altered. Here is an example. 


12.2.1 Show that f(z) = z’ defines a conformal mapping except at z = 0, where it 
doubles angles. 


This is no surprise because z + 2” is a two-sheeted covering of the plane C 
(see Figure 11.9 in Section 11.7). 


12.2.2 Show that the map z + 2? is two-to-one except at z = 0, and relate the 
angle doubling at z = 0 to the branch point of the covering. 


12.2.3 Similarly describe the behavior of the map z + z? at z = 0. 


12.3. Cauchy’s Theorem 


We have seen that interesting complex functions arise from integration. 
For example, the elliptic functions come from inversion of elliptic integrals 
(Section 10.8). However, it is not at first clear what the integral is f(t) dt 
means when Zo, z are complex numbers. It is natural, and not technically 
difficult, to define f ; f(t) dt as 1, f(d dt, the integral of f along a curve 


C from zg to z; the problem is that i f(@ dt appears to depend on C and 
hence may not be a function of z. 

The first to recognize and resolve this problem seems to have been 
Gauss. In a letter to Bessel, Gauss (1811) raised the problem and claimed 
its resolution as follows: 


Now how is one to think of f @(z) dz for z = a + ib? Evi- 
dently, if one wishes to start from clear concepts, one must 
assume that z changes by infinitely small increments (each of 
the form @ + i) from that value for which the integral is to 
be 0 to c = a + ib, and then sum all the ¢(z) dz... But now 

. continuous transition from one value of z to another a+ ib 
takes place along a curve and hence is possible in infinitely 
many ways. I now conjecture that the integral { o(z) dz will 
always have the same value after two different transitions if 
o(z) never becomes infinite within the region enclosed by the 
two curves representing the transitions. 


Translation of Gauss (1811) in Birkhoff (1973), p. 31 


12.3 Cauchy’s Theorem 213 


In the same letter, Gauss also observed that if 6(z) does become infinite 
in the region, then in general is o(z) dz will take different values when 
integrated along different curves. He saw in particular that the infinitely 
many values of log c corresponded to the different ways a path from 1 to c 
could wind around z = 0, the point where ¢(z) = 1/z becomes infinite. 

The theorem that f f() dt is independent of the path in a region where 
f is finite (and differentiable, which went without saying for Gauss) is now 
known as Cauchy’s theorem, since Cauchy was the first to offer a proof 
and to develop the consequences of the theorem. An equivalent statement 
is that i f(@dt = 0 for any closed curve C in a region where f is dif- 
ferentiable. Cauchy presented a proof to the Paris Academy in 1814 but 
first published it later (Cauchy (1825)). In Cauchy (1846) he gave a more 
transparent proof, based on the Cauchy—Riemann equations and the theo- 
rem of Green (1828) and Ostrogradsky (1828), relating a line integral to a 
surface integral. The latter theorem, usually known as Green’s theorem, is 
a generalization of the fundamental theorem of calculus to real functions 
g(x, y) of two variables and can be stated as follows: if C is a simple closed 
curve bounding a region R and g is suitably smooth, then 


[oax= [f Pavay and [oay=- [[ Pavay 
C R Oy C R Ox 


where f [5 is the surface integral over R and 4, is the line integral around C 
in the counterclockwise sense. (The difference in sign in the two formulas 
reflects the different sense of C when x and y are interchanged.) 


Cauchy’s theorem follows from Green’s by an easy calculation. If 
F(t) = ut) + v(t) 
is the decomposition of f into real and imaginary parts, and if we write 
dt = dx + idy, 
then 


[to dt = ik + iv)(dx + idy) 
Cc Cc 


= [wdx—edy +i [edx+udy 


af ey weet em 
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which equals 0 since 


Ou Ov Ov Ou 

—+—=+=0 and —-—= 

Oy Ox Oy Ox 
by the Cauchy—Riemann equations. This proof requires f to have a con- 
tinuous first derivative in order to be able to apply Green’s theorem. The 
restriction of continuity of f’(¢) in the proof was removed by Goursat 
(1900). As it happens, if f’ exists, it will have not only continuity but 
also derivatives of all orders. 

This follows from one of the remarkable consequences Cauchy (1837) 
drew from the assumption i f(@) dt = 0, namely, that f has a power-series 
expansion. By Goursat (1900), then, differentiability of a complex func- 
tion is enough to guarantee a power-series expansion. A generalization of 
this result to f that become infinite at isolated points was made by Lau- 
rent (1843) (f then has an expansion including negative powers, called the 
Laurent expansion) and to many-valued f with branch points by Puiseux 
(1850) (f then has an expansion in fractional powers, the Newton—Puiseux 
expansion). 


EXERCISES 


The Cauchy—Riemann equations follow easily from the existence of /’(z), 
that is, from the condition that 


is f(z + 6z) — f(z) 


6z0 OZ 


have the same value, regardless of the path along which 6z > 0. 


12.3.1 Suppose f(z) = u(x, y) + iv(x,y) and 6z = 6x + idy. By letting 6z — 0 
along the x-axis (Oy = 0) and along the y-axis (6x = 0), and equating the 
resulting values of f’(z), show that 

Ou Ov Ou Ov 


dx dy’ dy Ox’ 


These equations give a convenient test for a function u(x, y) + iv(x, y) to be a 
differentiable function of z = x + iy. 


12.3.2 Check that u(x, y) = x” — y* and v(x, y) = 2xy satisfy the Cauchy-Riemann 
equations. 


12.3.3 Express x? — y? + 2ixy as a function of z = x + iy. 
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12.4 Double Periodicity of Elliptic Functions 


The view of complex integration exposed by Cauchy’s theorem is one step 
toward understanding elliptic integrals such as 1 dt/ Vt(t — a)(t — B). The 
other important step is the idea of a Riemann surface (Section 11.8), which 
enables us to visualize the possible paths of integration from 0 to z. The 
“function” 1/ ./t(t — a)(t — 6) is of course two-valued and, by an argument 
like that in Section 11.8, is represented by a two-sheeted covering of the 
t sphere, with branch points at 0, a, 6, co. Thus the paths of integration, 
correctly viewed, are curves on this surface, which is topologically a torus 
(again, as in Section 11.8). 


Now a torus contains certain closed curves that do not bound a piece 
of the surface, such as the red and blue curves, C; and C2, shown in Figure 
12.2. There is no region R bounded by C; or C2; hence Green’s theorem 
does not apply, and we in fact obtain nonzero values 


dt 

oe i Vit — a(t —B) 
dt 

— I. Vig—ayt—B) 


Consequently the integral 


Oo !(z) = { a 
0 yt(t— ay(t — B) 


will be ambiguous: for each value ®~!(z) = w obtained for a certain path C 
from 0 to z we also obtain the values w+mw +nw by adding to C a detour 
that winds m times around C; and n times around C2. (For topological 
reasons, this is essentially the most general path of integration.) 


It follows that the inverse relation ®(w) = z, the elliptic function cor- 
responding to the integral, satisfies 


Ow) = O(w + mw, + nwW2) 


for any integers m, n. That is, ® is doubly periodic, with periods w1, w2. 
This intuitive explanation of double periodicity is due to Riemann (1851), 
who later (Riemann (1858a)) developed the theory of elliptic functions 
from this standpoint. 
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Figure 12.2: Nonbounding curves on the torus 


Remarkable series expansions of elliptic functions, which exhibit the 
double periodicity analytically, were discovered by Eisenstein (1847). The 
precedents for Eisenstein’s series, as Eisenstein himself pointed out, were 
partial fraction expansions of circular functions discovered by Euler, for 
example 


co 


1 
mcotmx = > 
x+n 


n=—oo 


(Euler (1748a), p. 191). It is obvious (at least formally, though one has to 
be a little careful about the meaning of this summation to ensure conver- 
gence) that the sum is unchanged when x is replaced by x + 1; hence the 
period | of mcot 7x is exhibited directly by its series expansion. Eisenstein 
showed that doubly periodic functions could be obtained by analogous 
expressions, such as 


1 
(z+ mw, +nw)?’ 


m,n=—co 


which again (with suitable interpretation to ensure convergence) are obvi- 
ously unchanged when z is replaced by z+ w or z+ w2. Hence we obtain a 
function with periods w,, w2. The function above is in fact identical (up to 
a constant) with the Weierstrass g-function, mentioned in Section 10.7 as 


the inverse to the integral f dt/ 403 — got — g3. Weierstrass (1863), p. 121, 


12.4 Double Periodicity of Elliptic Functions 217 


found the relations between g, g3 and the periods wy, wy: 


1 
=60 ) ————_,, 
i 2, (mw, + nw )* 
1 
= 140 ) ——_., 
a 3 (mw, + nw?)® 


where the sums are over all pairs (m,n) # (0, 0). Elegant modern accounts 
of the Eisenstein and Weierstrass theories may be found in Weil (1976) 
and Robert (1973). 


EXERCISES 
The precise definition of the Weierstrass g-function is 
Lo. 1 1 
a 
2 Bice (Z+mw,+nw)? (ma, + nw)? 
This series has better convergence than the Eisenstein series given above, but its 
double periodicity is not quite so obvious. We can establish double periodicity by 


differentiating and integrating as follows (which is valid because of the conver- 
gence properties of the Weierstrass series). 


12.4.1 By differentiating term by term, show that 


= 1 
, ee) ee 
9 @) pa (z+ mw, + nw) 


and conclude that g’(z + w,) = ¢’(z) and 9’(z + w2) = @’(z). 
12.4.2 By integrating the equations just obtained, show that 
pZ+wi)— yp) =c and gp +w2)— —(Z) = 4d, 
for some constants c and d. 


12.4.3, Deduce from Exercise 12.4.2 that 
o(F)-9(-Z)=¢ aa 0(F)-0(-F)=4 


12.4.4 But 9(z) = e(—z) (why?); hence conclude that ¢ is doubly periodic. 
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12.5 Elliptic Curves 


We have seen that nonsingular cubic curves of the form 
y =axr+bx?+cext+d (1) 


are important not only among the cubic curves themselves (see Newton’s 
classification, Sections 6.4 and 7.4), but also in number theory (Section 
10.3) and the theory of elliptic functions (Section 10.4). One of the great 
achievements of 19th-century mathematics was finding a unified view of 
all these aspects of cubic curves. The view was glimpsed by Jacobi (1834), 
and it came more clearly into focus with the development of complex anal- 
ysis between Riemann (1851) and Poincaré (1901). The theory of elliptic 
curves, as the unified view is now known, continues to inspire researchers 
today, since it seems to encompass some of the most fascinating problems 
of number theory. We now know, for example, how to derive Fermat’s last 
theorem (see Section 10.1) from properties of elliptic curves. 


Jacobi saw, at least implicitly, that the curve (1) could be parameterized 
as 


x=f@), y=f', (2) 


where f and its derivative f’ are elliptic functions. Knowing that f and 
f’ are doubly periodic, with the same periods w), w2, say, he would have 
seen that this gave a map of the z plane C onto the curve (1) for which the 
preimage of a given point on (1) is a set of points in C of the form 


Zt+tA= {z+mu, + nw: m,n € Z}, 


where 
A = {mw +nw2 : m,n € Z}. 


A is called the lattice of periods of f. The numbers z+ mw, +nw2 inz+A 
are said to be equivalent with respect to A. One such equivalence class is 
shown by asterisks inside parallelograms in Figure 12.3. 


The parameterization (2) gives a one-to-one correspondence between 
the points (f(z), f’(z)) of the curve and the equivalence classes z+A. Today 
we say that the curve is isomorphic to the space C/A of these equivalence 
classes. Jacobi might have seen, though it was probably not of interest to 
him, that C/A is a torus. One sees this by taking one parallelogram in C, 
which includes a representative of each equivalence class, and identifying 


12.5 Elliptic Curves 219 


Figure 12.3: Lattice-equivalent points 


the equivalent points on its boundary (that is, pasting opposite sides together, 
as in Figure 12.4). Of course, the torus form of (1) eventually came to light 
through the Riemann surface construction given in Section 11.8. 


Figure 12.4: Construction of torus by pasting 
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Weierstrass (1863) elegantly showed both the double periodicity of 
elliptic functions and the parameterization of cubic curves. Beginning with 


1 
(z+ mw, +nw)?’ 


co 


m,n=—co 


which is obviously double periodic, he defined the function 


NX 


to & : : 
es ) i=. fa A Ge 
PZ) = 5 noche ( +mw,+nw)? (mw, + wo) 


which has better convergence properties and is also doubly periodic. He 
then showed by simple computations with series that 


9' (2° = 49(2)° - g29(2 - 93; 


where gz, g3 are the constants, depending on w 1, w2, that were defined in 
Section 12.4. It follows that the point (@(z), 9’(z)) lies on the curve 


y” = 4x3 — gox - gs, (3) 


and a little further checking shows that (3) is in fact isomorphic to C/A, 
where A is the lattice of periods of 9. The parameterization of all curves (1) 
by elliptic functions follows by making a linear transformation. 


The reason for saying that the curve and C/A are isomorphic (which 
comes from the Greek for “same form”) is not only because they both 
have the form of a torus. They also have the same algebraic structure, 
which comes to light when we consider their natural addition operation. 


Once the curve (1) is parameterized as 


x= f(z), y = f'(2), 


the “addition” of points on the curve is induced by adding their parameter 
values. By the double periodicity of f and f’, this “addition” is simply 
ordinary addition in C, modulo A. In particular, it is immediate that addi- 
tion of points has some properties of ordinary addition, such as commu- 
tativity and associativity. However, as mentioned in Section 10.3, addition 
of parameter values z is also reflected in the geometry of the curve. The 
most concise statement of the relationship, due to Clebsch (1864), is that 
if 21, Z2, Z3 are parameter values of three collinear points, then 


Z+22+23=0 mod (a), W2) 
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(or Z} +Z2 +z3 € A). This means that addition of points also has an elemen- 
tary geometric interpretation, for which, incidentally, the algebraic prop- 
erties are far less obvious. 

On the other hand, the straight-line interpretation of addition gives the 
simplest explanation of the addition theorems for elliptic functions. As we 
saw in Section 10.3, the value of f(z3) is easy to compute as a rational 
function of f(z), f’(z1), f(z), f’(Z2) when z1, 2, z3 are the parameter 
values of collinear points. Originally, of course, the formula was obtained 
by Euler, with great difficulty, by manipulating the integral inverse to f 
(see Section 10.7). 

Another reason to accept C/A as the “right” view of the curve is that 
it answers the seemingly unrelated question of classification by projective 
equivalence. Recall from Section 7.4 that Newton reduced cubics to the 
cusp type, the double-point type, and three nonsingular types using real 
projective transformations. All cubics with a cusp are, in fact, equivalent 
to y* = x°, and all with a double point are equivalent to y* = x*(x + 1), 
while the distinction between the nonsingular types disappears over the 
complex numbers, where, as we now know, all are equivalent to tori C/A. 
The problem that remains is to decide projective equivalence among the 
nonsingular cubics. Salmon (1851) showed that this was determined by a 
certain complex number t, which can be computed from the equation of 
the curve. He defined t geometrically, so that its projective invariance was 
obvious, with no thought of elliptic functions. But t turned out to be noth- 
ing but w;/w2, which means that two nonsingular cubics are projectively 
equivalent if and only if their period lattices A have the same shape. 


EXERCISES 


Strictly speaking, the ratio T = w/w. determines only the shape of the par- 
allelogram with vertices 0, w1, W2, and Ww, + W2. 


12.5.1 Explain how both the angle between adjacent sides of this parallelogram, 
and the ratio between their lengths, may be extracted from Tt = w1/w2. 


The lattice of periods 
A= {mw +nw2 : m,n € Z} 


can be viewed as the set of vertices in a tiling of the plane by copies of this par- 
allelogram, as in Figure 12.3. However, infinitely many differently shaped par- 
allelograms give the same A. Thus the number 7 alone should not be taken to 
characterize the shape of A. 
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12.5.2 Show that A may also be tiled by copies of a parallelogram with shape 
given by 7+ 1. 

12.5.3 More generally, show that A may be generated by any two of its elements, 
W), = aw, + bw, and wy, = cw, + dw. provided ad — bc = +1. Hint: Write 

down a product of matrices transforming the column vector of (Ww, w2) to 

(w},, 4) and back to (w, w2), and take its determinant. 


12.5.4 Deduce from Exercise 12.5.3 that the lattice A = {mw , + nw2 : m,n € Z} 
has shape characterized by the whole family of complex numbers 


at+b 


w ; . 
where t=-— and a,b, c,d are integers with ad — bc = +1. 
ct+d W2 


There are functions of the complex variable t that depend only on the lattice 
A, and hence take the same value for each number (at + b)/(ct + d) characterizing 
the lattice shape. 


12.5.5 Consider gz and g3 from Section 12.4, which are obviously functions g2(A) 
and g3(A) of the lattice A. Show that 93 / J; and 93 / (g3 - 279%) are both 
functions of T. 


The latter function is none other than the famous modular function mentioned 
in Section 5.7 in connection with the solution of the quintic equation. For more 
on its amazing properties, see McKean and Moll (1997). 


12.6 Uniformization 


The characteristic of nonsingular cubics that allows their parameterization 
by elliptic functions is their topological form. The two periods correspond 
to the two essentially different circuits around the torus (Figure 12.2). 

A representation of the x and y values on a curve by simultaneous 
functions of a single parameter z is sometimes called a uniform represen- 
tation, and so the problem of parameterizing all algebraic curves in this 
way came to be known as the uniformization problem. Once the elliptic 
case was understood, it became clear that a solution of the uniformization 
problem for arbitrary algebraic curves would depend on a better under- 
standing of surfaces: their topology, the periodicities associated with their 
closed curves, and the way these periodicities could be reflected in C. 
These problems were first attacked by Poincaré and Klein in the 1880s, 
and their work led to the eventual positive solution of the uniformization 
problem by Poincaré (1907) and Koebe (1907). 
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Even more important than the solution of this single problem, however, 
was the amazing convergence of ideas in the preliminary work of Poincaré 
and Klein. They discovered that multiple periodicities are reflected in C 
by groups of transformations, and that the transformations in question are 
of the simple type z (az + b)/(cz + d), called linear fractional. We first 
met these transformations, for a real variable, as transformations of the 
projective line in Section 7.6. 

Linear fractional transformations generalize the linear transformations 
ZH Z+Wj1,Z/ Z+ We naturally associated with the periods of elliptic 
functions. However, while the transformations zh z+ W),ZhP Z+W2 
are algebraically and geometrically transparent—they commute, and they 
generate the general transformations z + z+mw, +nw2, which are simply 
translations of the plane—the more general linear fractional transforma- 
tions are not as easily understood. Linear fractional transformations do 
not normally commute, and their mastery requires a simultaneous grasp of 
algebraic, geometric, and topological aspects. 

The simultaneous view was enormously fruitful in the development of 
group theory and topology, as we will see in Chapters 14 and 15. Geometry 
also got a new lease of life when Poincaré (1882) found that linear frac- 
tional transformations give an interpretation of non-Euclidean geometry, a 
field that until then had been a curiosity on the fringes of mathematics. In 
the next chapter we look at the origins of non-Euclidean geometry and see 
how the subject was transformed by Poincaré’s discovery. 


EXERCISES 


The first example, beyond the elliptic functions, of periodicity under linear 
fractional transformations is seen in the modular function derived in the previous 
exercise set. It turns out that the periodicity of the modular function can be gen- 
erated by two transformations: z + z+ 1 and z + —1/z. This periodicity can be 
depicted by a pattern shown in Figure 13.20. 


12.6.1 Check that z+ z+ 1 and z+ —1/z are among the transformations 


az+b : : 
ZW are where a, b, c,d are integers with ad — bc = +1. 
CZ 


12.6.2 Show that the transformations z+» z+ 1 and z+» —1/z do not commute. 


12.6.3 Show that both zh z+ 1 and z + —1/z map the half-plane {Im z > 0} 
onto itself, and that z +» —1/z exchanges the inside and outside of the unit 
circle. 


® 


Check for 
updates 
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Non-Euclidean Geometries 


PREVIEW 


One of the new frontiers in geometry opened up by calculus was the study 
of curvature. The concept of curvature is particularly interesting for sur- 
faces, because it can be defined intrinsically. The intrinsic curvature, or 
Gaussian curvature as it is known, is unaltered by bending the surface, so 
it can be defined without reference to the surrounding space. 

This leads to the study of intrinsic surface geometry, in which distance, 
“lines” (curves of shortest length), angles, areas, and so on, are defined by 
measurements within the surface. 

The question then arises, to what extent does the intrinsic geometry of 
acurved surface resemble the classical geometry of the plane? For surfaces 
of constant curvature, the difference is reflected in two of Euclid’s axioms: 
the axiom that straight lines are infinite, and the parallel axiom. 

On surfaces of constant positive curvature, such as the sphere, all lines 
are finite and there are no parallels. On surfaces of zero curvature there 
may also be finite straight lines; but if all straight lines are infinite the 
parallel axiom holds. The most interesting case is constant negative curva- 
ture because it leads to a realization of non-Euclidean geometry, found by 
Beltrami (1868a). 

Poincaré (1882) showed that some of Beltrami’s realizations arise nat- 
urally in complex analysis. Papers had already been published with pic- 
tures of patterns of non-Euclidean “lines,” most notably Schwarz (1872). 
Thus, non-Euclidean geometry was actually a part of existing mathemat- 
ics, but a part whose geometric nature had not previously been understood. 
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13.1 Transcendental Curves 


We saw in Chapter 8 that calculus in the 17th century was greatly stim- 
ulated by problems in the geometry of curves. Differentiation grew from 
methods for constructing tangents, and integration from attempts to find 
areas and arc lengths. Not only did calculus unlock the secrets of the 
classical curves and of the algebraic curves defined by Descartes; it also 
extended the concept of curve itself. Once it became possible to handle 
slopes, lengths, and areas with precision, it also became possible to use 
these quantities to define new, nonalgebraic, curves. These were the curves 
called “mechanical” by Descartes (Section 6.3) and “transcendental” by 
Leibniz. In contrast to algebraic curves, which could be studied in some 
depth by purely algebraic methods, transcendental curves were insepara- 
ble from the methods of calculus. Hence it is not surprising that a new set 
of geometric ideas, the ideas of “infinitesimal” or differential geometry, 
emerged from the investigation of transcendental curves. 

Among the new results on transcendental curves was the first solution 
of the ancient problem of arc length. The problem was first posed for an 
algebraic curve, the circle, by the Greeks and in this case it is equivalent 
to an area problem (“squaring the circle’’), since both area and arc length 
of the circle depend on z. As we now know, z is a transcendental number 
(Section 2.3), so the arc length problem for the circle has no solution by 
the elementary means allowed by the Greeks. The first curve whose arc 
length could be found by elementary means was discovered by Harriot 
around 1590. It is the curve defined by the polar equation 


r =e 


known as the logarithmic or equiangular spiral. 

Harriot did not have the exponential function and knew the curve only 
by its equiangular property, which is that the tangent makes a constant 
angle a (depending on k) with the radius vector. The spiral turned up in his 
researches on navigation and map projections (Section 12.2) as the plane 
projection of a rhumb line on the sphere. A rhumb line is a curve that 
meets the meridians at a constant angle; in practical terms, it represents 
the course of a ship sailing in a fixed compass direction. 

Not having the tools of calculus, Harriot relied on ingenious geometry 
and a simple limit argument, which was brought to light by Lohne (1979). 
The idea should be clear from Figure 13.1. 
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Figure 13.1: Harriot’s construction of the equiangular spiral 


An equilateral triangle with base angles a is cut into similar trapezoids 
by lines parallel to its base. When each trapezoid is cut by its diagonal, the 
resulting triangles can be reassembled into a kind of fan bounded by a 
polygonal spiral consisting of pieces of the red sides of the original trian- 
gle. The red spiral is equiangular in the sense that every other line from 
the common apex of the triangles meets it at angle a. 

When the construction is continued indefinitely, it is obvious that the 
total length of the red spiral is the sum of the red sides of the triangle. This 
is true independently of the height of the trapezoids. Now, if we let the 
height of the trapezoids approach zero, the polygonal spiral approaches a 
smooth equiangular spiral, whose length therefore equals the sum of the 
red sides of the triangle. 

Harriot’s work was not published, and the arc length of the equiangu- 
lar spiral was rediscovered by Torricelli (1645). Gradually the problem of 
arc length became understood more systematically as a problem of inte- 
gration, though usually a rather intractable one. The first solution for an 
algebraic curve was for the “semicubical parabola” y? = x°, by Neil and 
Heuraet in 1657. Soon after this Wren! solved the problem for the cycloid, 
the path traced by a point on a circle rolling on a line. His solution was 


'This is none other than Sir Christopher Wren, famous for designing many churches in 
London, such as St Paul’s cathedral. 
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given by Wallis (1659). Wren found, remarkably, that the length of one 
arch of the cycloid is a rational multiple (namely, 4) of the diameter of the 
circle. 

Other remarkable properties of the cycloid are related to mechanics, 
and one of these will be seen geometrically in the next section. Another 
among the first known transcendental curves is the tractrix of Newton 
(1676b). Newton defined this curve by the property that the length of its 
tangent from point of contact to the x-axis is constant (Figure 13.2). 


Figure 13.2: The tractrix 


It follows that the tractrix satisfies 


where s denotes arc length. By using ds = dx? + dy’, this differential 
equation can be solved to give 


at+ Va? -y? 
<2oigg = aja? — y’, 
y 


the equation for the curve given, in more geometric language, by Huygens 
(1693b). Huygens pointed out that the curve could be interpreted as the 
path of a stone pulled by a string of length a (hence the name “‘tractrix”’). 
Thus the tractrix, too, has some mechanical significance. In fact it can be 
constructed from a famous mechanical curve, the catenary, which is the 
shape of a hanging chain. The method is described in the next section. But 
the most important role of the tractrix is to generate the pseudosphere, a 
surface of constant negative curvature discussed in Section 13.3. 


EXERCISES 


The arc length of y? = x°* is today a fairly routine exercise with the arc length 


dy\ 
integral { 1+ (4) dx. 
dx 
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13.1.1 Show that the arc length of y = x*/? between O and x = ais 


3/2 
a 1+ ch —1}. 
27 4 
Likewise, it is easy for us to derive properties of the logarithmic spiral from 


its polar equation and knowledge of the exponential function. 


13.1.2 Show that the logarithmic spiral is self-similar. That is, magnifying r = e* 


by a factor m to r = me" gives a curve that is congruent to the original (in 
fact, it results from a rotation of the original). 


Jakob Bernoulli was so impressed by this property of the logarithmic spi- 
ral that he arranged to have the spiral engraved on his tombstone, with a motto: 
Eadem mutata resurgo (“Though changed, I arise again the same”). (See Jakob 
Bernoulli (1692) p. 213.) 


13.1.3, Deduce the equiangular property of the logarithmic spiral from its self- 
similarity. 


The equation of the tractrix given above can be derived as follows. 


13.1.4 Explain why the constant tangent property implies = Y — ¥ then multiply 
both sides of this equation by & a= [1 +( ay )?, and deduce that 


13.1.5 Check by differentiation that x = alog —~—— = ae =e — a? — y? satisfies the 
differential equation found in Exercise 13.1. 4, and also show that x has the 
appropriate value when y = a. 


13.2 Curvature of Plane Curves 


As mentioned at the beginning of this chapter, curvature is one of the 
most important ideas in differential geometry. The extension of this idea 
from curves to surfaces and then to higher-dimensional spaces has had 
many important consequences for mathematics and physics, among them 
clarification of both the mathematical and physical meaning of “space,” 
“space-time,” and “gravitation.” In this section we look at the beginnings 
of the theory of curvature in the 17th-century theory of curves. 

Just as the direction of a curve C at point P is determined by its straight- 
line approximation, that is, tangent, at P, the curvature of C at P is deter- 
mined by an approximating circle. Newton (1665c) was the first to single 
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out the circle that defines the curvature: the circle through P whose center 
R is the limiting position of the intersection of the normal through P and 
the normal through a nearby point Q on the curve. R is called the center 
of curvature, RP = p the radius of curvature, and 1/p = xk the curvature. 
It follows that the circle of radius r has constant curvature 1/r. The only 
other curve of constant curvature is the straight line, which has curvature 0. 
This follows from the formula for curvature discovered by Newton (1671): 


_ [b+ dy/dxyp? 
7 d*y/dx? 


There is an interesting relationship between a curve C and the locus 
C’ of the center of curvature of C. C is the so-called involute of C’, which, 
intuitively speaking, is the path of the end of a piece of string as it is 
unwound from C’ (Figure 13.3). It is intuitively clear that Q, the end of 
the string, is instantaneously moving in a circle with center at P, the point 
where the string is tangential to C’. 

Huygens discovered that the involute of a cycloid is another cycloid—a 
property used in Huygens (1673) to design clocks with a cycloidal pendu- 
lum. (Thus if the blue curve above is replaced by a cycloid, a weight Q on 
the end of a string PQ swings in a cycloidal path which, by another result 
of Huygens, takes constant time.) Two other stunning results on involutes 
are due to the Bernoulli brothers. Jakob Bernoulli (1692) found that the 
involute of the logarithmic spiral is another logarithmic spiral, and Johann 
Bernoulli (1691) found that the tractrix is the involute of the catenary, 
y = coshx. 

EXERCISES 


Despite the complexity of the Newton curvature formula, it is easy enough 
to solve for y when the curvature x is zero. 


13.2.1 Use the formula to show that x = 0 implies that y is a linear function of x. 


13.2.2 Show that d@/ds = 1/r for the circle of radius r, and deduce that d6/ds = x 
for any curve. 

The description of the tractrix as the involute of the catenary is convenient for 
studying the pseudosphere. We therefore work out some steps in this approach 
in the following exercises. The curve C’ in Figure 13.3 is now assumed to be the 
catenary y = cosh x, which meets the y-axis at the point S where y = 1. 


13.2.3 Using the arc length integral on the catenary y = cosh x between S$ = (0, 1) 
and P = (o,cosho), show that 


arc length PS = sinha = PQ. 
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O R 


Figure 13.3: Construction of the involute 


13.2.4 Also find the equation of the tangent at P, and use it to show that R = 
(a — cotha,0). Then use the value of PO to show that 


1 1 


R= —— =—. 
Q sinha PQ 


13.2.5 Finally, use the length of PQ again to show that Q = (0 — tanho, secho), 
and show that the parametric equations of the tractrix C, 


x=o-tanho, y=secho, 


imply the cartesian equation of the tractrix (with a = 1), 


1+ y1l-y? 
a eg 
y 


232 13. Non-Euclidean Geometries 


13.3. Curvature of Surfaces 


The first approach to defining curvature at a point P of a surface S was to 
express it in terms of the curvature of plane curves, by considering sections 
of S by planes through the normal at P. Of course, different planes nor- 
mal to the surface at P may cut the surface in quite different curves, with 
different curvatures, as the example of the cylinder shows (Figure 13.4). 


<q 


Figure 13.4: Sections of the cylinder 


However, among these curves there will be one of maximum curvature 
and one of minimum curvature (which may be negative, if we give a sign 
to curvature according to the side on which the center of curvature lies). 
Euler (1760) showed that these two curvatures x, and k2, called the prin- 
cipal curvatures, occur in perpendicular sections and that together they 
determine the curvature x in a section at angle a to one of the principal 
sections by 


K=k cos’ a + Ko sin’ a. 


This is where we are led when the curvature of surfaces is subordinated 
to the curvature of curves. A deeper idea occurred to Gauss while he was 
working in geodesy (surveying and mapmaking): curvature of a surface 
may be detectable intrinsically, that is, by measurements entirely within 
the surface. The curvature of the earth, for example, was known from mea- 
surements made by explorers and surveyors, not (in the time of Gauss) 
by viewing it from space. Gauss (1827) made the extraordinary discovery 
that the quantity x, x2 can be defined intrinsically and hence can serve as an 
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intrinsic measure of curvature. He was so proud of this result that he called 
it the theorema egregium (excellent theorem). It follows in particular that 
K1K2, which is called the Gaussian curvature, is unaffected by bending. 

The plane, for example, has x; = kz = O and thus zero Gaussian cur- 
vature. Hence so has any surface obtained by bending a plane, such as a 
cylinder. We can verify the theorema egregium in this case, because one 
of the principal curvatures of a cylinder is obviously zero. 

Surfaces S',, S obtained from each other by bending are said to be 
isometric. More precisely, S$; and Sz are isometric if there is a one-to-one 
correspondence between points P; of S; and points P2 of S$ such that 


distance between P; and P', in S; = distance between P2 and P% in S 2, 


where the distances are measured within the respective surfaces. A more 
precise statement of the theorema egregium then is: if S 1, S2 are isometric, 
then S 1, Sz have the same Gaussian curvature at corresponding points. 


Surfaces of Constant Curvature 


The simplest surface of constant positive curvature is the sphere of radius r, 
which has curvature 1/r? at all points. Other surfaces of curvature 1 /r? may 
be obtained by bending portions of the sphere; however, all such surfaces 
have either edges or points where they are not smooth, as was proved by 
Hilbert (1901). The plane, as we have seen, has zero curvature, and so have 
all surfaces obtained by bending the plane or portions of it. 

It remains to investigate whether there are surfaces of constant negative 
curvature. In ordinary space, such a surface has principal curvatures of 
opposite sign at each point, so it looks locally like a saddle (Figure 13.5). 

Several surfaces of constant negative curvature were given by Mind- 
ing (1839). The most famous of them is the pseudosphere, the surface of 
revolution obtained by rotating a tractrix about the x-axis (Figure 13.6). 
This surface was investigated as early as 1693 by Huygens, who found its 
surface area, which is finite, and the volume and center of mass of the solid 
it encloses, which are also finite (Huygens (1693a)). 

The pseudosphere, despite the “sphere” part of its name, is more like a 
negative-curvature counterpart of the cylinder. So one may wonder whether 
a surface of constant negative curvature can be more like a plane. Hilbert 
(1901) proved that no smooth unbounded surface of constant negative cur- 
vature lies in ordinary space, so this rules out planelike surfaces and also 
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<q 


Figure 13.5: A saddle 


Figure 13.6: The tractrix and the pseudosphere 


accounts for the “edge” on the pseudosphere (where, in fact, the curvature 
of the tractrix becomes infinite). One can, however, make a “plane” of neg- 
ative curvature by using a nonstandard notion of length in the Euclidean 
plane. This discovery of Beltrami (1868a) is discussed in Section 13.7, 
along with other implications of negative curvature for non-Euclidean 
geometry. 

These geometric implications can also be glimpsed if we ask whether 
surfaces $1, S2 of equal curvature are isometric. Even with zero curva- 
ture this is false, since a plane is not isometric to a cylinder. What is true, 
though, is that any sufficiently small portion of the plane can be mapped 
isometrically into any part of the cylinder. Minding (1839) showed that 
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the same is true for any two surfaces $1, S of equal constant curvature. 
Taking S; = So, this says that rigid motion is possible in S ;: a body in S, 
can be moved, without any shrinking or stretching, to any part of S$; large 
enough to contain it. The latter restriction is necessary, for example, for 
the pseudosphere since it becomes arbitrarily narrow as x > oo. 

The possibility of rigid motion was fundamental to Euclid’s geometry 
of the plane, and with the discovery of curved surfaces that support rigid 
motion, Euclid’s geometry could be seen as a special case—the zero cur- 
vature case—of something broader. The broader notion of geometry on a 
surface begins to take shape once one has an appropriate notion of “straight 
line.” This is developed in the next section. 


EXERCISES 


The construction of the tractrix as the involute of the catenary in Section 13.2 
gives a remarkable insight into the two principal curvatures of the pseudosphere, 
enabling us to see why the pseudosphere has constant negative curvature. 


13.3.1 Interpreting PQ in Figure 13.3 as the radius of curvature of the tractrix, 
and hence as the curvature of a section of the pseudosphere, suggest an 
interpretation of QR as a radius of curvature. 


13.3.2 Assuming that PQ and QR are in fact principal radii of curvature, deduce 
from Exercise 13.2.4 that 


Gaussian curvature of the pseudosphere at any point = —1. 


13.4 Geodesics 


A “straight line” on a surface, or geodesic as it is called, can be defined 
equivalently by a shortest-distance property or a zero-curvature property. 
The shortest-distance definition has the drawback that a geodesic is not 
necessarily the shortest path between two points. On a sphere, for exam- 
ple, there are two geodesics between two nearby points P;, P2: the short 
portion and the long portion of the great circle through P;, P2. What is 
true is that the geodesic gives the shortest distance between any two of its 
points that are sufficiently close together. Even so, it is generally hard to 
find which curve between given points on a surface has minimum length. 
Nevertheless, this is how geodesics were first defined, by Jakob and Johann 
Bernoulli; and Euler (1728a) found a differential equation for geodesics 
from this approach. 
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A more elementary approach is to define the geodesic curvature kg at 
P of a curve C on a surface S as the ordinary curvature of the orthogo- 
nal projection of C in the tangent plane to S at P. As one might expect, 
geodesic curvature can also be defined intrinsically, and x, was introduced 
in this way by Gauss (1825). A geodesic is then a curve of zero geodesic 
curvature. This is the definition of Bonnet (1848). 

The latter definition immediately shows that great circles on the sphere 
are geodesics, since their projections onto tangent planes are straight lines. 
Other examples are the horizontal lines, vertical circles, and helices on the 
cylinder (Figure 13.7). These all come from straight lines on the plane that 
is rolled up to form the cylinder. 


Figure 13.7: Geodesics on the cylinder 


Geodesics on the pseudosphere, and other surfaces of negative curva- 
ture, are not all so simple to describe. However, Section 13.8 shows that 
they become simple when one maps the surface of constant negative cur- 
vature suitably onto a plane. 


EXERCISES 


13.4.1 Are the circles on the pseudosphere, in planes perpendicular to its axis, 
geodesics? Give a qualitative argument to support your answer. 


It may be easier to answer this question if one first considers the cone, a 
surface also obtained by bending the plane. To avoid worrying about the apex, 
where the cone is not smooth, we omit this point. 


13.4.2 Show that the circles on the cone, in planes perpendicular to its axis, are 
not geodesics. 


13.4.3 Show that there are nonsmooth geodesics on the cone, that is, curves of 
geodesic curvature zero except at certain points where they have no 
tangent. 
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13.5 The Parallel Axiom 


Until the 19th century, Euclid’s geometry enjoyed absolute authority, both 
as an axiomatic system and as a description of physical space. Euclid’s 
proofs were regarded as models of logical rigor, and his axioms were 
accepted as correct statements about physical space. Even today, Euclidean 
geometry is the simplest type of geometry, and it furnishes the simplest 
description of physical space for everyday purposes. Beyond the every- 
day world, however, lies a vast universe that can be understood only with 
the help of an expanded geometry. The expansion of geometric concepts 
began with doubts about one of Euclid’s axioms, the parallel axiom. 

For our purposes, the most convenient statement of the parallel axiom 
is as follows: 


Axiom P. For each straight line L and point P outside L there is exactly 
one line through P that does not meet L. 


There are many other equivalent statements of Axiom P;, some obvi- 
ously fairly close to it, for example, Euclid’s own from Section 2.1: 


That if a straight line falling on two straight lines make the 
interior angles on the same side less than two right angles, 
the two straight lines, if produced indefinitely, meet on that 
side on which are the angles less than the two right angles. 


Heath (1925), p. 202 
Other equivalents of Axiom P are less obviously so. For example, 


(i) The angle sum of a triangle = a (Euclid). 


(ii) The locus of points equidistant from a straight line is a straight line. 
(al-Haytham, around 1000 ce). 


(iii) Similar triangles of different sizes exist (Wallis (1663); see Fauvel 
and Gray (1988), p. 510). 


Thus a denial of the parallel axiom entails denial of (i), (ii), and (iii). A 
denial of (iii) means in particular that scale models are impossible, since 
three points in the original object and the three corresponding points of a 
scale model would define similar triangles of different sizes. 
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Such unlikely consequences convinced many people that the parallel 
axiom was a logically necessary property of straight lines, already implied 
by the other axioms of Euclid, and so efforts were made to prove it outright. 

The most tenacious attempt, entitled Euclides ab omni naevo vindica- 
tus (Euclid cleared of every flaw), was made by Saccheri (1733). Saccheri’s 
plan of attack began by subdividing the denial of the parallel axiom into 
two alternatives: 


Axiom Pp. There is no line through P that does not meet L. 
Axiom P»2. There are at least two lines through P that do not meet L. 


The next step was to destroy each alternative by deducing a contradiction 
from it. He succeeded in deducing a contradiction from Axiom Po, using 
other axioms of Euclid, such as the axiom that a straight line can be pro- 
longed indefinitely. (Such additional assumptions are certainly necessary, 
since great circles on the sphere have some properties of straight lines, 
except that they are finite in length.) 

Saccheri was less successful with Axiom Pz. The consequences he 
derived from it, hoping to obtain a contradiction, were as follows. Among 
the lines M through P that do not meet L are two extremes, M* or MW, 
called parallels or asymptotic lines (Figure 13.8); any of these lines M 
strictly between M* and M~ has a common perpendicular with L and, 
moreover, the position of this perpendicular tends to infinity as M tends 
to M* or M-. Although curious, these consequences of Axiom P2 were 
not contradictory and Saccheri, sensing that the contradiction was slipping 
away from him, tried to overtake it by proceeding to infinity. 


P 


Figure 13.8: Asymptotic lines 


He claimed that M* would meet L at infinity and have a common per- 
pendicular with it there. But this stil/ was not a contradiction. Saccheri 
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merely claimed that such a conclusion was “repugnant to the nature of the 
straight line” (Saccheri (1733), p.173), perhaps visualizing an intersec- 
tion like Figure 13.9. But why should asymptotic lines not be tangential 
at infinity? History was to show that this was an appropriate resolution 
of Saccheri’s “contradiction” (see Section 13.8). Thus Saccheri’s results 
were not, as he thought, steps toward a proof of the parallel axiom; they 
were the first theorems of a non-Euclidean geometry in which Axiom P 
replaces the parallel axiom. 


Mt 
[ee] 
L 
Figure 13.9: Hypothetical intersection at infinity 
EXERCISES 


The connection between the parallel axiom and the angle sum of a triangle 
is very direct and elegant. 


13.5.1 Deduce, from Euclid’s version of the parallel axiom, that a line falling on 
two parallel lines makes interior angles that sum to z. 


13.5.2 Use Exercise 13.5.1 and the construction in Figure 13.10 (in which CD is 
parallel to AB) to show thata+B+y=nz. 


B D 


Figure 13.10: The angle sum of a triangle 


13.5.3, Deduce from Exercise 13.5.2 that the angle sum of any quadrilateral is 27 
and, in particular, that squares exist. 


Thus theorems mentioning squares, such as the Pythagorean theorem, can 
hold only when Euclid’s parallel axiom is assumed. 
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13.6 Spherical and Hyperbolic Geometry 


In rejecting Po because of its incompatibility with infinite lines, Saccheri 
ruled out the most natural geometry in which Po holds, that of the sphere 
with great circles as “lines.” Spherical geometry had been cultivated since 
ancient times by astronomers and navigators, and formulas for the side 
lengths and areas of spherical triangles were well known. For the history of 
this now-neglected subject, see Van Brummelen (2013). But the sphere was 
considered part of Euclid’s spatial geometry, so the axiomatic significance 
of spherical geometry was initially ignored. However, spherical geometry 
did guide the first explorations of Axiom P). 

Lambert (1766) made the striking discovery that Axiom P, implies that 
the area of a triangle with angles a, 6, y is proportional to -(@a+f+y), 
its angular defect. In other words, 


area = -R°(a+B+y-7) 


for some positive constant R”. Having rediscovered the theorem (Exercise 
13.6.5 below) that, for a triangle on the sphere of radius R, 


area = R-(a+B+y-7), 


Lambert mused that one “could almost conclude that the new geometry 
would be true on a sphere of imaginary radius.” What a sphere of radius 
iR might be was unclear, but the idea that complex numbers can give the 
formulas of a hypothetical geometry proved fruitful. 

It was found that formulas implied by Axiom P are obtained from 
the corresponding formulas of spherical geometry replacing R by iR. This 
amounts to replacing circular functions by hyperbolic functions. For exam- 
ple, Gauss (1831) deduced from Axiom P, that the circumference of a cir- 
cle of radius r is 27R sinh r/R. The same expression follows by replacing 
R by iR in 27R sin r/R, which is the circumference of a circle of radius r 
on the sphere of radius R (where, of course, r is measured on the spherical 
surface; see the red circle in Figure 13.11 and Exercise 13.6.1). 

Lambert (1766) introduced the hyperbolic functions and noted their 
analogy with the circular functions, but he did not follow through with a 
complete translation of spherical formulas into hyperbolic formulas. This 
was first done by Taurinus (1826), one of a small circle who corresponded 
with Gauss on geometric questions. 
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The formulas gave the geometry of Axiom P a second leg to stand 
on, but there was still nothing solid under its feet. Neither Gauss nor Taur- 
inus seemed confident of finding an interpretation of the formulas. Gauss’s 
student Minding (1840) even showed that the hyperbolic formulas for tri- 
angles hold on the pseudosphere, but no one at that time commented on the 
likely importance of this result. Perhaps it was clear that the pseudosphere 
cannot serve as a “plane,” because it is infinite in only one direction. 

Only in 1868, when Beltrami extended the pseudosphere to a true 
“plane”—a surface locally isometric to the pseudosphere but infinite in all 
directions—was the new geometry given a firm foundation. Klein (1871) 
named the geometry of Axiom P) hyperbolic geometry, and its “plane” is 
now called the hyperbolic plane. 


EXERCISES 


13.6.1 Prove that the circumference of the circle C of radius r on the sphere of 
radius R (Figure 13.11) is 27R sin(r/R). 


Figure 13.11: Radius and circumference on the sphere 
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13.6.2 Show that both 27R sin(r/R) and 27R sinh(r/R) tend to 27r as R > oo, 


These results show how even non-Euclidean geometry is “Euclidean in the 
small”—its formulas tend to the Euclidean formulas as size tends to zero. The 
same is true of the angle-sum of a triangle, which has a surprising relationship 
with the area of the triangle. 

Figure 13.12 shows a spherical triangle Aggy with angles a, 8, y and its sides 
extended to three great circles. These great circles divide the sphere into eight 
triangles, in four antipodal pairs. In particular, if the vertices of Ags, are A, B,C 
as shown then their respective antipodal points A’, B’, C’ form a triangle equal to 
Aopy- 


Figure 13.12: Division of sphere by three great circles 


The points B,C,A’ form a triangle Ay which, together with A,g,, makes a 
“wedge” of the sphere with angle a (shown in Figure 13.13). 

This wedge obviously makes up + of the total area S of the sphere, so we 
can write 


Qa 
Aopy + Ag = —S. 
By on 


13.6.3 If we likewise define spherical triangles Ag = ACB’ and A, = ABC’ show 
that 


B 
Aopy + Ag = ab 


Y 
Aopy a Ay = =a 
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Figure 13.13: Wedge of sphere between two great circles 


and hence that 


hea cla 


3Aapy + Ag + Ag + Ay = Se 


13.6.4 Show also that 
2(Aopy + Aa + Ag + Ay) = S. 


13.6.5 Deduce from questions 13.6.3 and 13.6.4 that 


S 
Anpy = Gat B+y—7), 
so the area of a spherical triangle with angles a,8,y is proportional to 
a+£+y-1. This formula was discovered by Thomas Harriot in 1603. 


13.6.6 Deduce from Harriot’s area formula that the angle sum of a spherical tri- 
angle tends to z as its size tends to zero. 


13.7 Geometry of Bolyai and Lobachevsky 


The most important contributors to hyperbolic geometry between Gauss 
and Beltrami were Lobachevsky and Bolyai, who published independent 
discoveries of the subject: Lobachevsky (1829) and Janos Bolyai (1832b). 


244 13. Non-Euclidean Geometries 


Because of their courage in advocating an unconventional geometry, Bolyai 
and Lobachevsky have been justly admired. Nevertheless, the immediate 
impact of their work was slight. Many of their results were already known 
to Gauss and his circle and could have been picked up from existing publi- 
cations and personal contacts. Lambert (1766) and Taurinus (1826) were in 
print, and Bolyai’s father, F. Bolyai, was a lifelong friend of Gauss, as was 
Lobachevsky’s teacher Bartels. In any case their work, though more sys- 
tematic and convincing than previous attempts, attracted very little atten- 
tion at first. We have seen how the possibility of using differential geometry 
to justify hyperbolic geometry was overlooked until 1868. Up to that time, 
there seemed no reason to take hyperbolic geometry seriously. 

In retrospect, of course, the theorems of Bolyai and Lobachevsky can 
be seen to unify the fragmentary results of their predecessors very nicely. 
They cover the basic relations between sides and angles of triangles (hyper- 
bolic trigonometry), the measure of polygonal areas by angular defect, and 
formulas for circumference and area of circles. Lobachevsky (1836) broke 
new ground by finding volumes of polyhedra, which turn out to be far from 
elementary, involving the function f° log 2| sin ¢| dt. 

Both Bolyai and Lobachevsky considered a three-dimensional space 
satisfying Axiom P, and made extensive use of a surface peculiar to this 
space, the horosphere. A horosphere is a “sphere with center at infinity,” 
and it is not a hyperbolic plane. Wachter, a student of Gauss, observed in a 
letter of 1816 (published in Stéckel (1901)) that the geometry of the horo- 
sphere is in fact Euclidean. This astonishing result was rediscovered by 
Bolyai and Lobachevsky, and they anticipated that it would make Euclidean 
geometry subordinate to hyperbolic. We will see in Section 13.8 how this 
view was vindicated by the work of Beltrami. 


Beltrami’s Projective Model 


Interest in hyperbolic geometry was rekindled in the 1860s when unpub- 
lished work of Gauss, who had died in 1855, came to light. Learning that 
Gauss had taken hyperbolic geometry seriously, mathematicians became 
more receptive to non-Euclidean ideas. The works of Bolyai and 
Lobachevsky were rescued from obscurity and, approaching them from 
the viewpoint of differential geometry, Beltrami (1868a) was able to give 
them the concrete explanation that had eluded all his predecessors. 
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Beltrami had studied the geometry of surfaces and found the surfaces 
that can be mapped onto the plane in such a way that their geodesics go 
to straight lines (Beltrami (1865)). They turn out to be just the surfaces 
of constant curvature. In the case of positive curvature, the sphere, such a 
mapping is central projection onto a tangent plane (Figure 13.14), though 
of course this maps only half the sphere onto the whole plane. 


A 


Figure 13.14: Central projection 


The mappings of surfaces of constant negative curvature, on the other 
hand, take the whole surface onto only part of the plane. Figure 13.15, 
from Klein (1928), shows some of these mappings (the middle one being 
of the pseudosphere). The correspondence between the surfaces and their 
maps is easier to see if one imagines each surface rotated 90° clockwise, so 
that its geodesics point in roughly the same direction as the straight lines 
on the map. 


Each negatively curved surface S is mapped onto a portion of the unit 
disk. Beltrami (1868a) realized that the disk can then be viewed as a natu- 
ral extension of S to an “infinite plane,’ thus avoiding the problem of find- 
ing “planelike” surfaces of constant negative curvature in ordinary space. 
Instead one takes the disk as the “plane,” line segments within it as “lines,” 
and “distance” between two points of the disk as the distance between their 
preimage points on the surface S. The function d(P, Q), giving “distance” 
between points P, Q of the disk in this way, turns out to be meaningful for 
all points inside the unit circle, so the notion of “distance” extends to the 
whole open disk. As Q approaches the unit circle, d(P, Q) tends to infinity, 
so the “plane,” and hence the “lines” in it, are indeed infinite with respect 
to this nonstandard “distance.” 
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Abb, 218. 


Abb. 219. 


Abb. 220. 


Figure 13.15: Geodesic-preserving mappings 


All the axioms of Euclid, except the parallel axiom, hold in the new 
interpretation of “plane,’ “line,” and “distance.” Instead of the parallel 
axiom, one has of course Axiom P>», since there is more than one “line” 
through a point P outside a given “line” L that does not meet L (Figure 
13.16). 

Beltrami also observed that the rigid motions of the “plane,” since they 
map lines to lines, are necessarily projective transformations. They are 
precisely those projective transformations of the plane that map the unit 
circle onto itself. Consequently, this model of the hyperbolic plane is often 
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Figure 13.16: Failure of the parallel axiom 


called the projective model. Cayley (1859) had already observed that these 
projective transformations could be used to define a “distance” d(P, Q) in 
the unit disk—by saying d(P, Q) = d(P’, Q’) if a transformation preserving 
the unit circle sends P to P’ and Q to Q’—but he had not realized that the 
geometry obtained was that of Bolyai and Lobachevsky. 

The pseudosphere is not entirely superseded by the projective model, 
since it remains the source of “real” distances and angles, whereas those in 
the projective model are necessarily distorted. One of the distinctive curves 
of the hyperbolic plane, the horocycle, or circle with center at infinity, is 
shown particularly clearly on the pseudosphere. If one imagines, following 
Beltrami (1868a), the pseudosphere wrapped by infinitely many turns of 
an infinitely thin covering, then the edge of this covering (along the rim 
of the pseudosphere) is a horocycle. The middle picture of Figure 13.15 
shows the image of one turn of the covering, drawn solidly, and horocycles 
resulting from continued unwrapping are shown as dashed lines. 


EXERCISES 


Klein’s three pictures illustrate the three types of rigid motion of the hyper- 
bolic plane. 


1. Rotation, in which one point of the plane is fixed and all other points move 
in hyperbolic circles about it. (A hyperbolic circle is the locus of a point 
moving at constant “distance” from a fixed point.) 


2. Limit rotation, in which a point at infinity is fixed and all points of the plane 
move in horocycles centered on the fixed point at infinity. 


3. Translation, in which a “line” moves along itself and the other points of the 
plane move along its equidistant curves. (An equidistant curve is the locus 
of a point moving at constant “distance” from a “line.”) 
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13.7.1 Pick out hyperbolic circles and equidistant curves in the top and bottom 
pictures in Figure 13.15. 


13.7.2 If the center of rotation in the top picture were not at the center of the disk, 
do you think the hyperbolic circles would be Euclidean circles? 


13.7.3 Observe that equidistant curves at nonzero “distance” from the invariant 
“line” are not “lines” (necessarily so, in view of al-Haytham’s equivalent 
of axiom P; mentioned in Section 13.5). Does the translation move a point 
on an equidistant curve farther than a point on the invariant line? 


13.7.4 Give an example of three points in the hyperbolic plane, not in a “line,” 
that do not lie on a hyperbolic circle. (If this problem proves difficult, try it 
again after reading the next section.) 


13.8 Beltrami’s Conformal Models 


The projective model of the hyperbolic plane distorts angles as well as 
lengths. One can see this with the asymptotic geodesics on the pseudo- 
sphere, which clearly tend to tangency at infinity yet are mapped onto lines 
meeting at a nonzero angle at the boundary of the unit disk (Figure 13.15). 
Beltrami (1868b) found that models with true angles—the so-called con- 
formal models—can be obtained by sacrificing straightness of “lines.” His 
basic conformal model is not, in fact, part of the plane but part of a hemi- 
sphere. It is erected over the projective model and its “lines” are vertical 
sections of the hemisphere (hence semicircles) over the “lines” of the pro- 
jective model (Figure 13.17). The “distance” between points on the hemi- 
sphere is defined to be the “distance” between the points beneath them in 
the projective model. Later we will see that “distance” on the hemisphere 
also has a simple direct definition. 


Figure 13.17: From the projective disk model to the hemisphere 


The hemisphere model gives two planar conformal models by stereo- 
graphic projection onto the tangent plane opposite the point of projection. 
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As we know from Section 12.2, stereographic projection preserves angles 
and sends circles to circles. The first model is a disk (Figure 13.18) that, 
by change of scale, can again be taken as the unit disk. (The lightbulb 
represents the point of projection, at the top of the sphere whose bottom 
hemisphere is shown.) The second (Figure 13.19) is a half-plane, which 
we take to be the upper half-plane, y > 0. Since the “lines” in the hemi- 
sphere model are circular and orthogonal to the equator, “lines” in the 
planar conformal models are again circular, orthogonal to the boundary of 
the disk and half-plane, respectively, or straight lines in exceptional cases. 
To avoid continual mention of these exceptional cases—namely, line seg- 
ments through the disk center and lines x = constant in the half-plane—we 
consider lines to be circles of infinite radius. 


a 


Figure 13.18: From the hemisphere to the conformal disk model 


One of the beauties of the conformal models is that other important 
curves—hyperbolic “circles,” horocycles, and equidistant curves—are also 
real circles. Each curve equidistant from a given “line” Lis a circle through 
the endpoints of L on the boundary. Horocycles are circles tangential to the 
boundary and also, in the half-plane model, the lines y = constant. A cir- 
cle not meeting the boundary is a hyperbolic “circle,” but its “center,” at 
equal “distance” from all its points, is not at the Euclidean center. Figure 
13.20 shows some of these curves. They are imprinted on a tessellation of 
the half-plane by triangles with angles 2/2, 2/3, and 0, called the modular 
tessellation because it depicts the periodicity of the modular function. 


The triangles of the modular tessellation are bounded by “lines” and 
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Figure 13.19: From the hemisphere to the half-plane model 


they are in fact congruent in the sense of hyperbolic “distance.” This shows 
again that the boundary is infinitely far away, because there are infinitely 
many triangles below any point in the open half-plane. Note also that 
asymptotic “lines” are tangential at “infinity” (the boundary) and that the 
boundary is their common perpendicular, thus resolving the situation that 
Saccheri (Section 13.5) thought to be contradictory. 


“Distance” is particularly easy to express in the half-plane model. The 
“distance” ds between neighboring points (x, y) and (x + dx, y + dy) is 


ds = ————__ 
y 
that is, the Euclidean distance divided by y. Thus “distance” — oo as a 
point approaches the boundary y = 0 of the half-plane, as expected. For 
constant x, integration along a vertical line shows that “distance” increases 
exponentially relative to Euclidean distance as y decreases. For example, 
when x = O andy = 1, s, i ..., the “distances” between successive points 
are equal. The formula for ds was first obtained by Liouville (1850) by 
directly mapping the pseudosphere into the half-plane. The “distance” for- 
mula for the conformal disk was also found before Beltrami, by Riemann 
(1854b), but neither Liouville nor Riemann saw the hyperbolic geometry. 


Beltrami (1868b) not only found these models, in a unified way, but 
also extended the idea to n dimensions. For example, he gave a model of 
the three-dimensional space considered by Bolyai and Lobachevsky as the 
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“cincle” 


Figure 13.20: Some curves in the half-plane model 


upper half, z > 0, of ordinary (x, y, z)-space, with “distance” 


ydx? + dy? + dz? 


Zz 


sS= 


“Lines” are then semicircles orthogonal to z = 0 and “planes” are hemi- 
spheres orthogonal to z = 0. Restricting the “distance” function to such a 
hemisphere turns out to give Beltrami’s hemisphere model. Thus the hemi- 
sphere model can be viewed as a hyperbolic plane lying in hyperbolic 3- 
space. The horospheres of the half-space model are spheres tangential to 
z = 0, together with the planes z = constant. Beltrami (1868b) pointed out 
that on z = constant we have 


dx? + dy? + dz 


constant 


ds = 


that is, “distance” is proportional to Euclidean distance. Thus he had an 
immediate proof of Wachter’s wonderful theorem that the geometry of the 
horosphere is Euclidean. 


EXERCISES 


The mapping of the pseudosphere into the half-plane may be carried out as 
follows, using the parametric equations for the tractrix found in Exercise 13.2.5: 


x=o-tanho, y=secho. 
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First we replace the parameter o by the arc length 7 along the tractrix. 


13.8.1 Show that r = [(" ./1+ (“4)" dx = logcosha,, and hence y = e*. 


Now take tT and the angle X of rotation as the coordinates on the pseudosphere 
obtained by rotating the tractrix about the x-axis. 


13.8.2 Show that the length subtended by angle dX on a circular cross section of 
the pseudosphere is 
ydX =e "dx, 


and hence the distance between nearby points (X, Tr) and (X + dX,t + dr) 
on the pseudosphere is given by 


ds* = e°"dX* +. dr’. 


13.8.3 Finally, introduce the variable Y = e* and conclude that ds = eae 


Thus the pseudosphere is mapped into the (X, Y)-plane with preservation of 
distance, provided distance in the (X, Y) plane is defined by 


VdX?2 + dY? 


ds = y 


It follows, from what was said above, that geodesics on the pseudosphere corre- 
spond to semicircles with centers on the X-axis. This throws some light on the 
problem raised in Section 13.4—describing geodesics on the pseudosphere. 


13.8.4 Explain why the region of the (X, Y)-plane corresponding to the pseudo- 
sphere is bounded by X = O and X = 27 and it lies above some Y = 
constant > 0. 


13.8.5 By considering a semicircle crossing the region described in Exercise 13.8.4, 
show that there is no smooth closed geodesic on the pseudosphere. 


13.9 The Complex Interpretations 


One of the characteristics of the Euclidean plane is the existence of regu- 
lar tessellations: tilings of the plane by regular polygons. There are three 
such tilings, based on the square, equilateral triangle, and regular hexagon 
(Figure 13.21). 

Associated with each tiling is a group of rigid motions of the plane that 
maps the tiling pattern onto itself. For example, the unit square pattern is 
mapped onto itself by unit translations parallel to the x and y axes and by 
the rotation of z/2 about the origin, and these three motions generate all 
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Figure 13.21: Tessellations of the Euclidean plane 


motions of the tessellation onto itself. If we write z = x + iy, then these 
generating motions are given by the transformations 


ze zt, ZHez+i, ZH Zi. 
The triangle and hexagon tessellations have a group generated by 
zeztl, zezt+T, Zh 27, 
where t = e’*/? is the third vertex of the equilateral triangle whose other 


vertices are at 0, 1 (Figure 13.22). In fact, any motion of the Euclidean 
plane can be composed from translations z + z +a and rotations z # ze”. 


T 


I 


Figure 13.22: Relation between the triangle and hexagon tessellations 


The sphere also has a finite number of regular tessellations, obtained 
by central projections of the regular polyhedra (Section 2.2). Figure 13.23 
shows a tessellation corresponding to the icosahedron. (Each face has been 
further subdivided into six congruent triangles.) The motions mapping 


254 13. Non-Euclidean Geometries 


such a tessellation onto itself can be expressed as complex transforma- 
tions by interpreting the sphere as C U {oo} via stereographic projection 
(Section 11.6). Gauss (1819) found that any motion of the sphere can be 
expressed by a transformation of the form 

az+b 

—bz+a 


where a, b € C and an overbar denotes the complex conjugate. 


Figure 13.23: Icosahedral tessellation of the sphere 


The conformal models of the hyperbolic plane can be regarded as parts 
of C: the unit disk {z : |z| < 1} and the half-plane {z : Im(z) > 0}. Their 
rigid motions, being conformal transformations, are complex functions, 
and Poincaré (1882) made the beautiful discovery that they are of the form 


Zk = ae for the disk, and 
bz+a 
a for the half plane, 
Z+6 


where a,8,y,6 € R. Notice that the latter, with x in place of z, are the 
transformations of the projective line studied in Section 7.6. Thus the “line 
at infinity” of the hyperbolic plane is a projective line. 

Infinitely many regular tessellations are possible, since the angles of a 
polygon can be made arbitrarily small by increasing its area. For example, 
there are tessellations by equilateral triangles in which n triangles meet at 
each vertex, for each n > 7, and similar variety occurs for other polygons 
(see exercises). Some of these tessellations were known before Poincaré 
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(1882) gave the complex interpretation of hyperbolic geometry, and even 
before models of hyperbolic geometry were known. Figure 13.24 shows a 
tessellation by equilateral triangles of angle 7/4 found in unpublished, and 
unfortunately undated, work of Gauss (Werke, vol. VIII, p. 104). 


Figure 13.24: The Gauss tessellation 


Others arise from differential equations and were discovered in this 
context by Riemann (1858b) and Schwarz (1872) (the first published exam- 
ple, Figure 13.25). By explaining these tessellations in terms of hyperbolic 
geometry, Poincaré (1882) showed that hyperbolic geometry was part of 
existing mathematics. 


Figure 13.25: The Schwarz tessellation 


In a subsequent paper, Poincaré (1883) explained the geometric nature 
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of linear fractional transformations, 


az+b 
czt+d’ 


special cases of which, as we have just seen, express the rigid motions of 
the two-dimensional Euclidean, spherical, and hyperbolic geometries. He 
showed that each linear fractional transformation of the plane C is induced 
by a hyperbolic motion of the three-dimensional half-space whose “plane 
at infinity” is C; thus Poincaré’s theorem embraces those of Wachter and 
Beltrami on the representation of two-dimensional Euclidean, spherical, 
and hyperbolic geometry within three-dimensional hyperbolic geometry. 


EXERCISES 
13.9.1 Show that a triangle in the hyperbolic plane can have any angle sum < z. 
13.9.2 Deduce that there are equilateral triangles with angle 27/n for each n > 7. 


13.9.3, Also deduce that triangles with angle zero exist, in a certain sense, and that 
their area is finite. 


13.9.4 Find corresponding results for regular n-gons. 


® 
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Group Theory 


PREVIEW 


Group theory was the first branch of modern, or abstract, algebra to emerge 
from the old algebra of equations. Group theory today is often described as 
the theory of symmetry, and indeed groups have been inherent in symmet- 
ric objects since ancient times. However, extracting algebra from a sym- 
metric object is a highly abstract exercise, and groups first appeared in 
situations where some algebra was already present. 

One of the first nontrivial examples was the group of integers mod 
p, for prime p, used by Euler (1758) to prove Fermat’s little theorem. Of 
course, Euler had no idea that he was using a group. But he did use one of 
the characteristic group properties, namely, the existence of inverses. 

Likewise, Lagrange (1771) was not aware of the group concept when 
he studied permutations of the roots of equations. But he was using the 
group S,, of permutations of n things, and some of its subgroups. 

It was Galois (1831a) who first truly grasped the group concept, and he 
used it brilliantly to explain what makes an equation solvable by radicals. 
In particular, he was able to explain why the general quintic equation is 
not solvable by radicals. These discoveries changed the face of algebra, 
though few mathematicians realized it at first. 

In the second half of the 19th century the group concept spread from 
algebra to geometry, following the observation of Klein (1872) that each 
geometry is characterized by a group of transformations. 
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14.1. The Group Concept 


The notion of group is one of the most important unifying ideas in math- 
ematics. It draws together a wide range of mathematical structures for 
which a notion of combination, or “product,” exists. Such products include 
the ordinary arithmetical product of numbers, but a more typical example 
is the product, or composition, of functions. If f and g are functions, then 
fg is the function whose value for argument x is f(g(x)). (Thus fg means 
“apply g, then f.’ We have to pay attention to order because in general 
gf # f9.) 

A group G is defined formally to be a set with an operation, usually 
called product and denoted by juxtaposition, a specific element called the 
identity and written 1, and, for each g € G, an element called the inverse 
of g and written g~', with the following properties: 


(i) 91(9293) = (g192)93 forall gi,92,93 €G (associative property) 
(ii) gl =lg=g forallgeG (identity property) 
(iii) gy! =g'g=1 forallgeG (inverse property) 


These axioms emerged gradually in the course of experience with partic- 
ular groups. The stories of some of these groups will be recounted below. 
In practice, properties (i) and (ii) are usually evident, and it is more impor- 
tant to ensure that the product operation is merely defined for all elements 
of G. Many mathematical concepts have been created in response to the 
desire, at first unconscious, for products to exist. 

For example, we saw in Section 7.2 that a perspective view of a per- 
spective view is not generally a perspective view. So if the “product” fg of 
perspective transformations f and g means the result of performing g then 
f, then fg does not always belong to the set of perspective transforma- 
tions. The set of projective transformations is the smallest extension of the 
set of perspective transformations to a set on which the product is always 
defined, namely, the set of finite products of perspective transformations. 

In other instances, concepts have arisen from the desire to have inverses. 
Negative numbers, for example, can be viewed as extending the set 
{0, 1,2,3,...} of natural numbers to the set Z of integers, in which each 
element has an inverse under the + operation. (In cases like this one, where 
the group operation is naturally written as +, the identity element is writ- 
ten 0 and the inverse of g is written —g.) Another example is the extension 
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of the line R to the real projective line RP! = R U {co}, which ensures 
that each linear fractional function has an inverse. Likewise, extending the 
plane by points at infinity ensures that each projective transformation has 
an inverse, because points projected to infinity can be projected back again. 

Inverses sometimes appear unintentionally, as it were, in finite situ- 
ations where repeated application of the group operation eventually pro- 
duces the identity element. The simplest example is the cyclic group Zn, 
which consists of the numbers 0, 1,2,...,” — 1 under addition modulo n, 
where numbers are called congruent modulo n when they differ by a multi- 
ple of n. Here the identity element is 0, and n—1 is the inverse of 1 because 
their sum is congruent to 0, modulo n. Similarly, n — 2 is the inverse of 2, 
n — 3 is the inverse of 3, and so on. 

Perhaps the earliest nontrivial use of inverses occurs with multiplica- 
tion modulo p, which Euler (1758) (and possibly Fermat before him) used 
to give an essentially group-theoretic proof of Fermat’s little theorem. We 
proved this theorem with the help of the binomial theorem (and without 
using inverses) in Section 5.9. We now abbreviate “modulo” as “mod,” 
and assume p is prime. 

Since integers m and n are congruent mod p if they differ by an integer 
multiple of p, b is an inverse of a under multiplication mod p if ab is 
congruent to 1 modulo p, that is, if ab + kp = 1 for some integer k. Since 
p is prime, such a b exists for each a not a multiple of p, by applying the 
Euclidean algorithm to the relatively prime numbers a, p (Section 3.3). 

Euler did not define a group in his proof, but it is easy for us to do so 
(and to rephrase his proof accordingly; see exercises). The group elements 
are the numbers 1,2, ..., p — 1, and the product of a and b is defined to be 
ab mod p, where 


ab mod p = 
the number among 1,2,..., p — 1 that is congruent to ab, mod p. 


Group properties (i) and (ii) follow from ordinary arithmetic; (iii), as we 
have seen, follows from the Euclidean algorithm. 

The preceding examples illustrate the influence of geometry and num- 
ber theory on the group concept. An even more decisive influence was the 
theory of equations, which we look at briefly in Section 14.3. But first we 
need to understand a little about subgroups—the groups within a group— 
and when a subgroup may be said to “divide” a group. For a more detailed 
account of the development of the group concept, see Wussing (1984). 
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EXERCISES 

A good introduction to inverses under multiplication mod p may be had with 
p = 5. There is no need to use the Euclidean algorithm to find these inverses—just 
multiply by numbers < 5 until a product congruent to | (mod 5) is obtained. 


14.1.1 Find the inverses of 2, 3, and 4 under multiplication mod 5. 


Now here is the proof of Fermat’s little theorem using inverses mod p. Start with 
the nonzero numbers, mod p, 


1, 2, ae) (p > 1), 
and multiply them all by a nonzero a (mod p). 


14.1.2 Notice that if we multiply again by the inverse of a (mod p) we get back 
the numbers 


i De lee OL 


Why does this show that the numbers 
a-\1modp, a:-2modp, ..., a(p—1)modp 
are distinct and nonzero? 


14.1.3 Deduce from Exercise 14.1.2 that if a is nonzero (mod p), then 
{a-lmodp, a-2modp, ..., a(p-—1) mod p} 


is the same set as 
{1, 2, ..., (p—J}. 


14.1.4 Deduce from Exercise 14.1.3 that 


PA teeee (p= 1 mode 12 = 1) med p, 


14.1.5 Finally, deduce that 


that is, 


(Fermat’s little theorem; the version in Section 5.9 results from multiplying 
each side by a). 
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14.2 Subgroups and Quotients 


The group concept was implicit in mathematics for a long time before it 
became explicit. The first substantial theorem of the subject, now known 
as Lagrange’s theorem, also came before the formalization of the group 
concept, but to state it we will take advantage of current terminology. 

A subset H of a group G is called a subgroup of G if H is also a 
group (under the same operation that makes G a group). For example, the 
set Z of integers is a subgroup of the group R of real numbers, under the 
addition operation. Lagrange’s theorem concerns the number of members 
of a group H, which we call the order of H and denote by |H]. It states that: 

If H is a subgroup of a finite group G, then |H| divides |G|. 

Lagrange (1771) proved a special case. Jordan (1870) proved the gen- 
eral case and generously named the theorem after Lagrange. The proof 
depends on the notion of coset. For each g in G we have the left coset of H 


gH ={ghi,gho,...,ghx}, where H = {hy,lo,..., hx}. 


In words, gH is the set that results from multiplying each member of H on 
the left by g. (There are right cosets Hg defined similarly, but we do not 
need them for this proof.) The key properties of cosets are: 


1. Each coset gH has |H| members, because we can recover the mem- 
bers of H by multiplying each member of gH on the left by g™!. 


2. Any two different cosets g)H and goH are disjoint. This is because, 
if gH and g2H have a common member g, we have 


g=g\h| =g2h2 for some hy, hp in H. 
But then 
q= gohyh; | (multiplying on the right by hy ), 


whence 
gil = gx(hoh;'H) = 92H, 


since hyh;! is a member of H, and multiplying A by any one of its 
members gives back H. 


It follows from these two properties that G can be split into disjoint 
sets gH, each of size |H]|, so |H| divides |G]. oO 
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Under certain conditions, it makes sense to multiply cosets by the rule 
gill: gH = 9192H. 


For this rule to make sense, we must get the same answer g/gH = gig2H 
whenever gH = g\H and gH = g2H. This happens when gH = Hg for 
each g in G because, under this condition, 


GigrH = g\Hg, because gH = Hg}, 
=g\Hg because gH = g/H, 
=g9195H because gH = Hg}, 
=g9192H because gH = goH. Oo 


We call H a normal subgroup of G if it satisfies the condition gH = Hg 
for each g in G, and in this case the cosets form a group called G/H, the 
quotient of G by H. The group properties are inherited from G, as is easy 
to check (see exercises). 

If G has the property that gg’ = g’g for all g, g’ in G (in which case we 
call G abelian, for reasons that will be explained in the next section), then 
obviously gH = Hg for any subgroup H. This means that any subgroup 
H of an abelian group G is normal, and we can form the quotient group 
G/H. The concept of normal subgroup is therefore interesting only when 
G is not an abelian group. In this case, the first step towards understanding 
the structure of G is to look for normal subgroups. 

All this was first understood and made explicit by Galois, whose work 
we introduce in the next section. 


EXERCISES 


The group properties of G/H follow from the definition of the product of 
cosets, 91H - g2H = gig2H. 


14.2.1 Show that 
gi H(92H 93H) = (gi\H-92H)-93H if and only if gi(g2g3) = (gi92)93: 
hence associativity in G/H follows from associativity in G. 

14.2.2 Show that H = 1H is the identity element of G/H. 


14.2.3 What is the inverse of gH in G/H? Explain your answer. 
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A 
Figure 14.1: The symmetries of the equilateral triangle 


The smallest nonabelian group is a group of six elements that may be viewed as 
the symmetries of the equilateral triangle. If we fix a position of the triangle, then 
there are six motions of it (including the “motion” that does not change it at all) 
leading to a position where it looks the same as it did before. These motions can 
be distinguished by where they send the vertices A, B, and C (Figure 14.1). 

The six motions form a group (called S3 for reasons that will be given in the 
next section) under the operation of combining motions. We combine motions by 
viewing each motion as a function f(P) of points P in the triangle, so “do g, then 
f° means to form the function fg(P), as mentioned in Section 14.1. 


14.2.4 Why are there only six motions leading to positions that look the same? 
Why is this group not abelian? 


14.2.5 A subgroup H of S33 consists of three rotations, through 0°, 120°, and 240°, 
represented by the pictures in the top row. 


14.2.6 The bottom row of the picture represents a coset gH for some g in S3. 
Describe the motion g, and verify that Hg is the same set as gH. 


14.2.7 Show that any subgroup H with only two cosets in a group G is a normal 
subgroup. 


14.3. Permutations and Theory of Equations 


We saw in Section 5.8 that, as early as 1321, Levi ben Gershon found 
that there are n! permutations of n things. These permutations are invert- 
ible functions that form a group S,, (called the symmetric group) under 
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composition, though their behavior under composition was not considered 
until the 18th century. It was when the idea of permutation was applied to 
the roots of polynomial equations, by Vandermonde (1771) and Lagrange 
(1771), that the first truly group-theoretic properties of permutations came 
to light. At the same time, Vandermonde and Lagrange found the key to 
understanding the solution of equations by radicals. 

They began with the observation that if an equation 


Max! 40+ ba, ix+a, =0 (1) 
has roots x1, X2,...,Xn, then 
x” aye) te tag ax ta, == x) a— mm) (x— x). (2) 


By multiplying out the right-hand side and comparing coefficients one 
finds that the a; are certain functions of x), x2,...,X,. For example, 


an (= 1) 


a, = —(X1 + Xo +--+ +X). 


These functions are symmetric, that is, unaltered by any permutation of 
X1,X2,...,Xn, since the right-hand side of (2) is unaltered by such permu- 
tations. Consequently, any rational function of aj, da2,...,@, is Symmetric 
as a function of x1, x2,..., Xn. Now the object of solution by radicals is to 
apply rational operations and radicals to a\,d2,..., dy SO as to obtain the 
roots—which are the completely asymmetric functions ;. 

Radicals must therefore reduce symmetry in some way, and one can 
see how in the quadratic case. The roots of 


x° + ax + ay = (x—x1)(x — 22) = 0 


-a,+ iG —4a. (x, +%)+ eect —2x1xX2 + ae 


X1,%2 = = : 


2 2 


and we notice that the symmetric functions x; + x2 and < — 2x1XxX2 + Ee 
yield the two asymmetric functions x), x2 when the two-valued radical 
is introduced. In general, a radical {/ multiplies the number of values of 
the function by p and divides symmetry by p, in the sense that the group 
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of permutations leaving the function unaltered is reduced to 1/p of its 
previous size. 

Vandermonde and Lagrange found they could explain the previous 
solutions of cubic and quartic equations in terms of such symmetry reduc- 
tion in the corresponding permutation groups, $3 and S4. They also found 
some properties of subgroups, such as Lagrange’s theorem mentioned in 
the previous section. However, they did not understand the relation between 
radicals and subgroups of S,, well enough to handle equations of degree 
> 5. Ruffini (1799) and Abel (1826) made enough progress with S's to be 
able to prove the unsolvability of the quintic, but they did not get beyond 
this. None of these authors were aware of the group concept, and it is only 
with hindsight that we can interpret their results in group-theoretic terms. 

The concept, and indeed the word “group,” is due to Galois (1831b). 
Along with it, Galois introduced the concept of normal subgroup, which 
finally unlocks the secret of solvability by radicals. Galois showed that 
each equation E has a group Gg consisting of the permutations of the 
roots that leave rational functions of the coefficients unaltered, and that 
the reduction of symmetry caused by introduction of a radical corresponds 
to formation of a normal subgroup. More precisely, if E is an equation 
solvable by radicals if and only if there is a chain of subgroups 


Ge = A, 2D Hp 2-:- 2 Hy = {1} 
such that each H;,, is anormal subgroup of H; and H;/Hj., is cyclic. 
(Moreover, if H;/Hj,, is cyclic of order n then the step from H; to H+; 
corresponds to introduction of an nth root.) Such a group Gz is now called 
solvable because it signals solvability of the corresponding equation. 

Examples of solvable groups are $3 and S'4, as one would expect from 
the known solvability of the corresponding equations. Also, it is easy to see 
that all finite abelian groups are solvable, so each equation with an abelian 
group is solvable by radicals—a result due to Abel (1829). This is why we 
call such groups “abelian.” If EF is the general equation of degree n, then 
Gg = S,, and the theorem of Ruffini and Abel is recovered by showing that 
S,, 1s not solvable for n > 5 (see, for example, Dickson (1903)). 

This brief sketch of Galois’s ideas covers only a part of his theory. 
Another part is his theory of fields, which is needed to clarify the notion 
of rational function. We take up the theory of fields in Chapter . Group 
theory and field theory make up what is currently known as Galois theory 
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(see, for example, Edwards (1984)). What one might consider to be the 
summit of Galois’s theory, the solution of equations by elliptic and related 
functions, is currently a fairly remote specialty. It appears in earlier books 
such as Jordan (1870) and Klein (1884), and more recently in McKean and 
Moll (1997). The greatest triumph of this theory was the solution of the 
general quintic equation by elliptic modular functions in Hermite (1858), 
following a hint in Galois (183 1a) (see also Section 14.8). 


EXERCISES 


The simplest type of permutation is a transposition, which swaps two things 
and leaves the others fixed. 


14.3.1 Show that any permutation is a product of transpositions, that is, any arrange- 
ment of n things may be achieved by repeated swaps. 


The group S,, of all permutations of n things has an important subgroup A, 
consisting of the even permutations. An even permutation f of {1,2,...,n} is 
one with an even number of inversions, that is, pairs (i, 7) for which i < j and 
FS@ > f(y) (Cramer (1750), p. 658). 

Evenness can be seen by placing the numbers 1,2,..., in two rows, one 
above the other, and drawing a line from k in the top row to f(x) in the bottom 
row. Figure 14.2 shows the permutation f(1) = 2, f(2) = 3, f(3) = 1 in this way. 


i 2 3 


1 2 3 


Figure 14.2: A permutation diagram 


14.3.2 Explain why a permutation is even if and only if its diagram has an even 
number of crossings. 


14.3.3 Show that the product of even permutations is even, so the even permuta- 
tions of {1,2,...,} form a group A,. (It is called the alternating group.) 


14.3.4 Show that evenness does not depend on how the numbers 1,2,..., are 
assigned to the n things. (Hint: if the numbers are permuted by g, show 
that the permutation f is replaced by the permutation g™! fg.) 


14.3.5 If g is an odd permutation, that is, g € S, — A,, show that the set gA, = 
{gf : f € A,} is all the odd permutations in S,,; hence A, contains exactly 
half the members of S ,,. 
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It follows from Exercise 14.3.5, and Exercise 14.2.7, that A, is a normal sub- 
group of S,; hence we can always form the cyclic quotient $,,/A, = Z2. Thus 
the real problem in solving the general equation of degree n is to “solve” A, by 
finding normal subgroups inside it. 

The group $3 turns out to be solvable because its normal subgroup A; is 
already cyclic. This can be seen by studying the permutations in A3, but more 
easily by interpreting $3 geometrically. 


14.3.6 Interpret the symmetry group of the equilateral triangle, discussed in the 
previous exercise set, as the group $3 of all permutations of three things. 


14.3.7 Show that, under this interpretation, the cyclic subgroup of rotations is A3. 


The interpretation we speak of here is an example of what is technically called 
an isomorphism between the triangle symmetry group and S3. An isomorphism 
is a One-to-one correspondence between the two groups that preserves the group 
operation, thus establishing that the groups have the “same form.” (We used this 
expression in the same sense in Section 12.5.) In calling the rotation subgroup 
cyclic we also imply an isomorphism, namely, the one that pairs the rotations 
through 0°, 120°, 240° with the members 0, 1, 2 of Z; respectively. 


14.3.8 Show that there is an isomorphism between the symmetry group of the 
regular tetrahedron and $4. To which symmetries do the members of Aq 
correspond? 


14.4 Permutation Groups 


Galois understood “group” to mean a group of permutations of a finite 
set, so his definition stated only that the product of two permutations in 
the group must again be a member of the group. Associativity, identity, 
and inverses were consequences of his assumptions, and indeed too obvi- 
ous to be considered important from his point of view. Galois’s work was 
published only in 1846, and by that time the theory of finite permutation 
groups had been taken up and systematized by Cauchy (1844). Cauchy 
likewise required only closure under product in his definition of group, 
but he recognized the importance of identity and inverses by introducing 
the notation of 1 for the identity and f~! for the inverse of f. 

Cayley (1854) was the first to consider the possibility of more abstract 
group elements, and with it the need to postulate associativity. (Inciden- 
tally, a group operation for which associativity is not obvious is that defined 
by the chord construction on a cubic curve: see Sections 10.3 and 12.5.) 
He took group elements to be simply “symbols,” with a “product” of A 
and B written A - B and subject to the law A-(B-C) = (A: B)-C, anda 
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unique element | subject to the laws A-1 = 1-A = A. He still assumed that 
each group was finite, however. This meant that the existence of inverses 
did not have to be postulated, only the validity of cancellation. 

The existence of inverses in a finite group G, as defined by Cayley, 
follows from an argument used by Cauchy (1815) and developed more 
fully in Cauchy (1844). If A € G, then the powers A”, A®,... all belong to 
G and hence they eventually include a recurrence of the same element: 


A” =A"  wherem <n. 


Then, assuming that it is valid to cancel A” from both sides, A””” is the 
identity element 1 and A”! is the inverse of A. 

The need to postulate inverses first arises with infinite groups, where 
this argument no longer holds. Geometry was historically the most impor- 
tant source of infinite groups, as we will see in Section 14.6. It was in 
extending Cayley’s abstract group theory to cover the symmetry groups of 
infinite tessellations that Dyck (1883) made the first mention of inverses in 
the definition of groups. We return to Dyck’s concept of group in Section 
14.7. 

A theorem of Cayley (1878) shows that abstraction of the group con- 
cept is, in a sense, empty, because every group is essentially the same as a 
group of permutations. Cayley proved the theorem for finite groups only, 
where it is more valuable, but the proof easily extends to arbitrary groups 
(see exercises). 


EXERCISES 


The proof of Cayley’s theorem goes as follows. Given any group G, associate 
any g in G with the function xg that sends each h € G to hg. 


14.4.1 Show that function xg is a permutation of G, by showing that its effect can 
be undone by the function xg7!. 


14.4.2 Show that different group elements gj, go give different functions xg1, Xgo, 
and hence that there is a one-to-one correspondence between the elements 
g in G and the permutations xg of G. 


14.4.3 Show that the permutation of G obtained by applying xg , then go, is the 
permutation obtained by applying xg go. 


Thus the group of permutations xg is isomorphic to the group G, in the sense 
described in the previous exercise set. This is the precise way of saying that G is 
essentially the same as a group of permutations. 
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14.5 Polyhedral Groups 


A beautiful illustration of Cayley’s theorem that every group is a permu- 
tation group is provided by the regular polyhedra, whose rotation groups 
turn out to be important subgroups of S4 and Ss. If we imagine a polyhe- 
dron P occupying a region R in space, the rotations of P can be viewed as 
the different ways of fitting P into R. 

We begin with the rotations of the tetrahedron 7. T has four vertices, 
V,, V2, V3, V4, So each rotation of T is determined by a permutation of the 
four things V;, V2, V3, V4. There are 4X3 = 12 rotations, because V; can be 
put at any of the four vertices of R, after which three choices remain for the 
remaining triangle of vertices V2, V3, V4. One can check, using the fact that 
a permutation that leaves one element fixed and rotates the other three is 
even, that all the symmetries of T are even permutations of V\, V2, V3, V4. 
But the subgroup Ay, of all even permutations in S 4 has 5x4! = 12 elements 
by the exercises in Section 14.3, so the rotation group of T is precisely Ag. 

The full permutation group S 4 can be realized by the rotations of the 
cube. The four elements of the cube that are permuted are the long diago- 
nals (shown in red, yellow, blue, and green in Figure 14.3). 


Figure 14.3: The cube and its diagonals 


One has to check, first, that each permutation of the diagonals actually 
occurs. While doing this, it becomes apparent that the position of the diag- 
onals (bearing in mind that endpoints could be swapped) really determines 
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the position of the cube (Exercise 14.5.1). S44 is also the rotation group of 
the octahedron, because of the dual relationship between cube and octahe- 
dron seen in Figure 14.4. Each rotation of the cube is clearly a rotation of 
its dual octahedron, and conversely. 


Figure 14.4: Dual polyhedra 


Likewise, the dual relationship between dodecahedron and icosahe- 
dron (Figure 14.4) shows that they have the same rotation group. This 
group turns out to be As, the subgroup of even permutations in S5. The 
five elements of the dodecahedron whose even permutations determine 
these rotations are tetrahedra formed from sets of four vertices (see Figure 
14.5). 


Figure 14.5: The tetrahedra in a dodecahedron 
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More on the polyhedral groups is in the famous book of Klein (1884), 
relating the theory of equations to the rotations of the regular polyhe- 
dra and functions of a complex variable. The complex variable makes 
its appearance when the regular polyhedra are replaced by regular tes- 
sellations of the sphere C U {oo}, and their rotations by linear fractional 
transformations, as in Section 13.9. Klein (1876) showed that, with triv- 
ial exceptions, all finite groups of linear fractional transformations come 
from the rotations of the regular polyhedra in this way. 

The regular polyhedra were also the source of another approach to 
groups: presentation by generators and relations. Hamilton (1856) showed 
that the icosahedral group can be generated by three elements 1, y, A sub- 
ject to the relations 

P=yV=R=1, A=w. (1) 
This means that any element of the icosahedral group is a product (possibly 
with repetitions) of v, y, A and that any relation between 1, y, A follows from 
the relations (1). Dyck (1882) gave similar presentations of the cube and 
tetrahedron groups, and for the groups of certain finite tessellations, as part 
of the first general discussion of generators and relations. We return to this 
in Section 14.7. 


EXERCISES 
14.5.1 Show that each permutation of the diagonals of a cube is realizable, for 
example by showing that each transposition is realizable. 


14.5.2 Show that a permutation of the diagonals uniquely determines the position 
of the cube. 


Now consider the following rotations of the cube: 


t = 180° rotation about a line through the midpoints of opposite edges, 


X = 120° rotation about a diagonal. 
These obviously satisfy 1? = y? = 1. 
14.5.3 Show that vy = A, where 


A = 90° rotation about a line through the centers of opposite faces, 


where the lines are, for example, the blue, red, and green ones shown in 
Figure 14.6 (these lines are fixed in space, not in the cube). 


14.5.4 Deduce from Exercise 14.5.3 that 1? = y? = (vy)* = 1 for the cube. 
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Figure 14.6: Rotation axes of the cube 


14.5.5 Show that the analogous 1, y for the tetrahedron satisfy 
P=P=(yy=l, 
and the analogous 2, v for the dodecahedron satisfy 
P= aq’ al 


14.6 Groups and Geometries 


As the regular polyhedra show, geometric symmetry is fundamentally a 
group-theoretic notion. More generally, many notions of equivalence in 
geometry can be explained as properties preserved by certain groups of 
transformations. However, some revision of classical notions was needed 
before geometry could benefit from group-theoretic ideas. 

The oldest notion of geometric equivalence is that of congruence. The 
Greeks understood figures F and F2 to be congruent if there is a rigid 
motion of F’', that carries it into F2. But this concept of motion had mean- 
ing only for the individual figure. The “product” of motions of different 
figures was meaningless, so one did not have a group of motions. 
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The first step on the path to group theory in geometry was extend- 
ing the idea of motion to the whole plane by Mobius (1827). This gave 
meaning to the product of motions. In fact, Mébius considered the class 
of all continuous transformations of the plane that preserve straightness 
of lines, and he picked out several subclasses: those that preserve length 
(congruences), shape (similarities), and parallelism (affinities). He showed 
that the most general continuous transformations preserving straightness 
are just the projective transformations, so in one stroke Mobius defined 
the notions of congruence, similarity, affinity, and projective equivalence 
as properties invariant under certain classes of transformations. That the 
classes in question are groups was obvious as soon as one recognized the 
concept of group. But the group concept was recognized only slowly: the 
ideas of Mobius were first stated in terms of groups by Klein (1872). 

Klein’s formulation became known as the Erlanger Programm because 
he announced it at the University of Erlangen. He associated each geome- 
try with a group of transformations that preserve its characteristic proper- 
ties. Thus, characteristic properties show up as invariants of a group. For 
example, the group of plane Euclidean geometry is the group of Euclidean 
rigid motions—transformations of R* that preserve the Euclidean distance 

(x2 — x1)? + (y2 — yi)” between points (x;,y,) and (x2, y2). Euclidean 
distance is therefore an invariant, by the very definition of the group. 

A more interesting example is the group of the real projective line 
RP', studied in Section 7.6. Here we start with the group, the group of 
real linear fractional transformations, and discover its invariant, the cross- 
ratio, which is not at all obvious visually. Plane projective geometry is 
similarly associated with the group of projective transformations of RP”, 
and its fundamental invariant is likewise the cross-ratio. 

Plane hyperbolic geometry, in view of the projective model, can be 
defined by the group of projective transformations that map the unit circle 
onto itself. An important influence on the Erlanger Programm was indeed 
Cayley (1859), where this group was first shown to determine a geometry, 
and the subsequent realization of Klein (1871) that the elements of this 
group are the rigid motions of hyperbolic geometry. Not surprisingly, its 
fundamental invariant is the hyperbolic distance, and this turns out to be a 
function of the cross-ratio. 

Poincaré (1882) discovered that the rigid motions of the half-plane 
model are determined by projective transformations of its boundary—the 
real projective line—as we saw in Section 13.9. So hyperbolic geome- 
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try is also definable by the group of real linear fractional transformations, 
by extending these transformations from the line to the half-plane. (For 
an introduction to hyperbolic geometry based on this idea, see Stillwell 
(2005).) 

When geometry is reformulated in terms of groups, certain geomet- 
ric questions become natural questions about groups. A regular tessella- 
tion, for example, corresponds to a subgroup of the full group of motions, 
consisting of those motions that map the tessellation onto itself. In the 
case of hyperbolic geometry, where the problem of classifying tessella- 
tions is formidable, the interplay between geometric and group-theoretic 
ideas proved to be very fruitful. In the work of Poincaré (1882, 1883) and 
Klein (1882b), group theory is the catalyst for a new synthesis of geomet- 
ric, topological, and combinatorial ideas, which are described in Sections 
14.7 and 15.5. 


EXERCISES 


If we view geometric objects (points, lines, curves, and so on) as subsets X of 
a space S, then relations such as congruence arise from groups of transformations 
of S in the following way. There is a group G of maps g : S — S, and each 
geometric object X has a G-orbit {g(X) : g € G}, consisting of the objects onto 
which X is mapped by elements of G. 

For example, if A is a triangle in the plane R’, and G consists of all transfor- 
mations of R? that preserve length, then {g(A) : g € G} consists of all triangles 
congruent to A. This example shows that members of the same G-orbit are “equiv- 
alent” in a sense that depends on G. In fact, we always get an equivalence relation 
from a group in this way. Here is another example. 


14.6.1 If G = {similarities of R?}, what is {g(A) : g € G} for a triangle A? 


For any group G of transformations, we define a relation X =g Y (“X is G- 
equivalent to Y”’) between subsets X, Y of S by 


X=gY © X isin the G-orbit of Y. 


Then the group properties of G imply the following properties of the relation =g. 


14.6.2 Show that the relation &g has the properties 


X26 X (reflexive) 
X2,Y > Y2oX (symmetric) 
X2gYandY¥2oZ > X2gZ (transitive) 


14.6.3 At which points does your solution of Exercise 14.6.2 involve the existence 
of an identity, existence of inverses, and existence of products in G? 
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The properties in Exercise 14.6.2 show that =g is an equivalence relation, 
according to the definition in the exercises for Section 2.1. There it was also noted 
that the reflexive and transitive properties actually imply symmetry, provided that 
transitivity is stated in the manner of Euclid’s Common Notion 1: “Things equiv- 
alent to the same thing are equivalent to each other.” 


14.6.4 Prove Common Notion | for &g: 


X2¢oYandZ2o,Y > X 2G Z. 


You will see that this proof involves inverses, which previously were needed 
only to prove symmetry. This confirms that Euclid’s Common Notion | is in some 
sense a combination of both transitivity and symmetry. 

Returning to a particular group and its invariants, here is an example of the 
way in which an invariant can throw light on its group. 


14.6.5 Given three points A, B,C on RP’, show that there is a unique fourth point 
x such that the cross-ratio 


(C — A)(x — B) 
(C — By(x— A) 


has a given value y. 


14.6.6 Deduce from Exercise 14.6.5 that each linear fractional transformation of 
RP! is determined by its values on any three points A, B,C. 


14.7 Combinatorial Group Theory 


As mentioned in Section 14.5, the groups of the regular polyhedra were the 
first to be defined in terms of generators and relations. With finite groups 
such as these, however, one is concerned mainly with the simplicity and 
elegance of a presentation; the question of existence does not arise. For any 
finite group G one can trivially obtain a finite set of generators (namely, all 
the elements g1,...,g, of G) and defining relations (namely, all equations 
9i9; = 9x holding among the generators). Of course the same argument 
gives an infinite set of generators and defining relations for an infinite 
group, but this is also not interesting. The problem is to find finite sets 
of generators and defining relations for infinite groups where possible. 
This problem was first solved for the symmetry groups of certain reg- 
ular tessellations, and such examples were the basis of the first systematic 
study of generators and relations, by Klein’s student Dyck. Dyck’s papers 
(1882, 1883) laid the foundations of this approach to group theory, now 
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called combinatorial (and, more recently, geometric). For more technical 
information, as well as detailed history of the development of combinato- 
rial group theory, see Chandler and Magnus (1982). 

Figure 14.7 illustrates how generators and relations arise naturally 
from tessellations. This tessellation is based on the regular tessellation 
of the Euclidean plane by unit squares, but each square has been subdi- 
vided into black and white triangles to eliminate symmetries by rotation 
and reflection. The symmetries that remain are generated by 


1. horizontal translation of length 1 
2. vertical translation of length 1 
These generators are subject to the obvious relation 
ab = ba, 


which implies that any element of the group can be written in the form 
a™b". If g = a™b™ andh = a™b™, then g = h only if m, = my and 
nN, = ny, that is, only if g = h is a consequence of the relation ab = ba. 
Thus all relations g = h in the group follow from ab = ba, which means 
that the latter relation is a defining relation of the group. 


Figure 14.7: A tessellation of the plane 
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The obviousness of the defining relation in this case blinds us to a 
fact that becomes more evident with tessellations of the hyperbolic plane: 
the generators and relations can be read off the tessellation. Group ele- 
ments correspond to cells in the tessellation, squares in the present exam- 
ple. If we fix the square corresponding to the identity element 1, then the 
square to which square 1 is sent by the group element g may be called 
square g. The generators a*!, b*! are the elements that send square | to 
adjacent squares. They generate the group because square 1 can be sent 
to any other square by a series of moves from square to adjacent square. 
Relations correspond to equal sequences of moves or, what amounts to 
the same thing, to sequences of moves that return square | to its starting 
position. These sequences can all be derived from a circuit around a vertex 
(Figure 14.8), that is, the sequence aba~'b~!. Thus all relations are derived 
from aba~'b7! = 1, or, equivalently, ab = ba. 


Figure 14.8: Circuit around a vertex 


Generalizing these ideas, Poincaré (1882) showed that the symmetry 
groups of all regular tessellations, whether of the sphere, Euclidean plane, 
or hyperbolic plane, can be represented by finitely many generators and 
relations. Generators correspond to moves of the basic cell to adjacent 
cells, and hence to the sides of the basic cell; defining relations corre- 
spond to its circuits around its vertices. These results are also important 
for topology, as we will see in Chapter 15. 

The notion of group abstracted from such examples was expressed in 
a somewhat technical way, involving normal subgroups, by Dyck (1882). 


278 14 Group Theory 


The following, simpler, approach was worked out by Dehn and used by 
Dehn’s student Magnus (1930). A group G is defined by a set {a1, a2,...} 
of generators and a set {W; = W),W2 = W4,...} of defining relations. 
Each generator a; is called a letter; a; has an inverse as and arbitrary 
finite sequences (“products”) of letters and inverse letters are called words. 

Words W, W’ are called equivalent if W = W’ is a consequence of the 
defining relations, that is, if W may be converted to W’ by a sequence of 
replacements of subwords W; by W; (or vice versa) and cancellation (or 
insertion) of subwords aja;', a as The elements of G are the equivalence 
classes 

[W] = {W’ : W’ is equivalent to W}, 


and the product of elements [U], [V] is defined by 
[U][V] = [UV], 


where UV denotes the result of concatenating the words U, V. It has to be 
checked that this product is well defined, but once this is done, the group 
properties (i), (ii), and (iii) of Section 14.1 follow easily. 


EXERCISES 
Here is how one verifies that the classes [W] have the group properties. 


14.7.1 If U is equivalent to U’, show that UV is equivalent to U’V. Conclude, 
using this and a similar result for V’, that the product [U][V] is independent 
of the choice of representatives for [U], [V]. 


14.7.2 [U] ([V]LW]) = (LUI[V]) [W] is trivial. Why? 
14.7.3 Show that 1 = equivalence class of the empty word. 


14.7.4 Show that [W]"' = [W~'], where W™! is the result of writing W backward 
and changing the sign of each exponent. 


The smallest nonabelian group S'3 is also the smallest group with interesting 
defining relations. We take $3 to be the group of symmetries of the equilateral 
triangle, as in the exercises to Section 14.2. 


14.7.5 Show that S3 is generated by a 120° rotation r about its center, and a 180° 
rotation s about the vertical axis of symmetry. Also show that r and s satisfy 
the relations 

rP=s=1, rs=sr. 


14.7.6 Deduce from Exercise 14.7.5 that each element of S53 can be written in the 
form 


m on 
res, 


where m = 0, 1,2 andn = 0, 1. 
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14.7.7 Conclude from Exercise 14.7.6 that r> = s* = 1 and r’s = sr are defining 
relations for $3. 


14.7.8 By a similar argument, show the group of symmetries of a regular n-gon 
has defining relations r” = s* = 1, r’"!s = sr. 


14.8 Finite Simple Groups 


A group is called simple if it has no normal subgroups other than itself 
and the group {1} whose only member is the identity element. The reason 
for the name is that a simple group cannot be “simplified” by forming the 
quotient G/H by a normal subgroup H. In this sense of simplicity, simple 
groups are like prime numbers, which cannot be “simplified” by dividing 
them by smaller integers. We do not claim that simple groups or prime 
numbers are not complicated! 

The most obvious examples of finite simple groups are in fact the 
prime numbers, or more precisely the cyclic groups Z, for prime num- 
bers p. Z, is simple because it has no subgroups whatever except itself 
and {1} (thanks to Lagrange’s theorem that the size of a subgroup divides 
the size of the group). In fact, these are the only abelian simple groups, and 
we will ignore them from now on. The interesting simple groups are those 
that are not abelian, and the first examples were discovered by Galois in 
his study of polynomial equations. 

The smallest nonabelian simple group is As, the group of the 60 even 
permutations of five things. The simplicity of As is the obstruction to the 
solution of the quintic equation by radicals. As we saw in Section 14.3, the 
group of the quintic equation is Ss, the group of all 120 permutations of 
five things. Solving the quintic equation by radicals is equivalent to finding 
a chain of subgroups 


S52 H,2H)2---D2D {Il} 


such that the quotient of each group by the next is cyclic. We can make a 
first step, 
S52 As, 


but we can go no further because $5 has no other nontrivial normal sub- 
group and As is simple. 

The proof that As is simple (see exercises below) can be extended 
to show that A, is simple for all n > 5, so Galois actually discovered a 
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whole infinite family of simple groups. He also found three remarkable 
simple groups in the study of modular equations, which arise in the the- 
ory of elliptic functions. The starting point of these investigations was the 
Fagnano (1718) formula for doubling the arc length of the lemniscate 
(Section 10.6): 


eat 4 dt 2x V1 — x4 
2 ——_ = ———., where y= ———. 
if v1l-4 ‘i vl-7 1+ x4 
This gives the polynomial equation between x and y, of degree 2 in y: 
y°(1 + ey = 4x01 - x), 
In the early 19th century, Fagnano’s discovery was generalized to other 
elliptic integrals and to n-tupling instead of doubling, by Legendre, Gauss, 
Abel, and Jacobi. Galois left only some cryptic remarks about multiplica- 
tion by 5, 7, and 11 (implying that they yield equations of degree 5, 7, and 
11) in a letter that he wrote just before his death. 

It turns out that the modular equation of degree 5 is equivalent to the 
general quintic equation, which is why Hermite (1858) was able to solve 
the general quintic equation by elliptic modular functions. However, the 
modular equations of degree 7 and 11 have groups of size 336 and 1320 
respectively, so they are not symmetric groups S ,,. The nature of these new 
groups was revealed by Jordan (1870). They can be viewed as (what we 
would now call) transformation groups of finite projective lines. 

What is a finite projective line? It is like the real projective line RP’ = 
R U {oo} discussed in Section 7.6, except that R is replaced by a finite 
field. Finite fields were discovered by Galois, and we met some of them 
in Section 14.1 when we discussed addition and multiplication mod p. 
Since the latter operations have the same behavior as ordinary addition 
and multiplication—in particular, each nonzero number has an inverse— 
we can operate on the set F, = {0,1,2,...,p — 1} as we normally do, 
to solve equations and so on. Moroever, linear fractional functions make 
sense on F,, U {oo}, if we agree as usual that 


1/0=co and 1/0 =0. 
So we can view F,, U {co} as a finite projective line, and its linear frac- 
tional functions as “projections.” Moreover, the cross-ratio makes sense 


on F,, U {oo}, and it is invariant under linear fractional functions by the 
same argument as in Section 7.6. 
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For this reason, the group of functions 


ax+b 
cxt+d’ 


where a,b,c,d€F, and ad—bc #0, 


is called the projective general linear group, PGL(2, p). The reason for 
the 2 is that the coefficients a,b,c,d behave like the 2 x 2 matrix (¢ EY, 
It turns out that PGL(2,5), PGL(2, 7), and PGL(2, 11) are the groups of 
the modular equations of degree 5, 7, and 11 respectively. Moreover, each 
of these groups PGL(2, p) contains a simple subgroup, called PSL(2, p), 
which is half of its size. This was shown by Jordan (1870). 

PSL(2, 5) is the same as As, but PSL(2, 7) is a new simple group with 
168 elements, and PSL(11) is a simple group with 660 elements. It also 
happens that PSL(2, 7) is the smallest nonabelian simple group, other than 
PSL(2,5) = As. PSL(2, 7) makes several other spectacular appearances in 
geometry, which may be seen in the article Gray (1982). 

These examples give only the tiniest glimpse of the world of simple 
groups. Nevertheless, they hint at one of its most fascinating features— 
there are meaningful finite analogues of infinite structures such as the real 
projective line. 


EXERCISES 


As is simple for quite elementary reasons, which can be understood with only 
slight knowledge of permutations. This includes the nature of even permutations, 
explored in the exercises to Section 14.3, and the decomposition of permutations 
into cycles, which we explore here. 

We say that (a), d2,..., a) is a k-cycle of a permutation f of {1,2,...,n} if 


f(aq)=a, f(ma)=a3, ..., flqav=a 


for distinct numbers aj, a2,...,a,%. Each number in {1,2,...,} belongs to some 
k-cycle of f, so f is a product of disjoint cycles. For example, if f is 


then f = (1, 2)(3, 4, 5). It follows from the Exercises in Section 14.3 that the only 
even k-cycles among the even permutations of {1,2,3,4,5} are the 3-cycles and 
the 5-cycles. 
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14.8.1 Omitting 1-cycles from the cycle decomposition, show that the only pos- 
sible types of cycle decomposition for (nonidentity) members of As are 
(a, b,c), (a, b)(c, d), and (a, b,c, d, e). 


14.8.2 Recalling that f - g means “g, then f,” check that 


Ga) (,2,3,4,5)- (2, 1,4,3,5) = d,5, 3). 
Gi) (1,2), 4) - , 2)(4,5) = G, 4,5). 


The preceding exercises show that a subgroup H of As with enough elements 
of type (a, b)(c,d) or (a,b,c, d,e) also contains a 3-cycle. We now study what 
happens when H is normal and not equal to {1}, and show that such an H contains 
enough elements to ensure that 3-cycles are present. 

Recall from Section 14.2 that a normal subgroup H of As satisfies gH = Hg 
for each g in As. It follows that gHg7! = H, that is, if h is in H, so is ghg™! for 
any gin As. 


14.8.3 Show that if H contains a 5-cycle (a,b,c,d,e) then it also contains the 
5-cycle (g(a), g(b), g(c), g(d), g(e)) for each g in As. 


14.8.4 Show that if H contains a product of 2-cycles (a, b)(c, d) then it also con- 
tains the product of 2-cycles (g(a), g(b))(g(c), g(d)) for each g in As. 


14.8.5 Deduce from Exercises 14.8.3 and 14.8.4, and calculations like those in 
Exercise 14.8.2, that H contains a 3-cycle. 


14.8.6 Deduce from the preceding exercises that H contains all 3-cycles. 


To prove that As is simple, it now remains to prove that the normal subgroup 
HT # {1} in fact contains all members of As. 


14.8.7 By using 3-cycles to produce other elements of As, show that H = As. 


® 


Check for 
updates 
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Topology 


PREVIEW 

In Chapter 11 we saw how Riemann found the topological concept of 
genus to be important in the study of algebraic curves. In the present 
chapter we will see how topology became a major field of mathematics, 
with its own methods and problems. 

Naturally, topology interacts with geometry, and it is common for topo- 
logical ideas to be noticed first in geometry. An important example is the 
Euler characteristic, which was originally observed as a characteristic of 
polyhedra. Later it was seen to be meaningful for arbitrary closed surfaces. 
Today, we tend to think that topology comes first, and that it controls what 
can happen in geometry. 

Topology also interacts with algebra. In this chapter we focus on the 
fundamental group, a group that describes the ways in which flexible loops 
can lie in a geometric object. On a sphere, all loops can be shrunk to a 
point, so the fundamental group is trivial. On the torus there are many 
nonshrinkable loops, but they are all combinations of two particular loops, 
a and b, where ab = ba. The latter relation is equivalent to aba~'b™! = 1, 
meaning that the product aba~'b™! of loops is shrinkable to a point. 

Thus the fundamental group presents itself naturally with generators 
(basic loops) and relations (shrinkable products of loops). This establishes 
a connection between topology and combinatorial group theory, discussed 
in Section 14.7. In fact, in a sense, topology contains all of combinatorial 
group theory. This is both a blessing and a curse: it allows group theory to 
be used in topology, but it infects topology with the hardest problems of 
combinatorial group theory. 
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15.1 Geometry and Topology 


Topology deals with those properties that remain invariant under contin- 
uous transformations. In Klein’s Erlanger Programm (where it is briefly 
mentioned under its old name of analysis situs) topology is the “geometry” 
of groups of continuous invertible transformations, or homeomorphisms. 
The “spaces” to which transformations apply, and indeed the meaning of 
“continuous,” remain somewhat open. When these terms are interpreted 
in the most general way, as subject only to certain axioms (which do not 
concern us here), one has general topology. The theorems of general topol- 
ogy appear in fields ranging from set theory to analysis, but they are not 
very geometric in flavor. Geometric topology, the subject of this chapter, 
is obtained when the transformations are ordinary continuous functions 
on R” or on certain subsets of R”. Examples are the “topological equiva- 
lences” between surfaces we spoke about in Section 11.8. 

Geometric topology is more recognizably “geometric” than general 
topology, though one would expect the “geometry” to be of a discrete and 
combinatorial kind. Ordinary geometric quantities—such as length, angle, 
and curvature—admit continuous variation and hence cannot be invariant 
under continuous transformations. Topologically invariant quantities are 
things such as the number of “pieces” of a figure or the number of “holes” 
in it. It turns out, though, that the discrete structures of topology are often 
reflected by discrete structures in ordinary geometry, such as polyhedra 
and tessellations. In surface topology, this geometric modeling of topo- 
logical structure is so complete that topology becomes essentially a part 
of ordinary geometry. “Ordinary” here means geometry with notions of 
length, angle, and curvature—not necessarily Euclidean geometry. In fact, 
the natural geometric models of most surfaces are hyperbolic. 

It remains to be seen whether topology as a whole will ever be sub- 
ordinate to ordinary geometry. This is so in three dimensions, and here 
too hyperbolic geometry is the most important geometry (see Thurston 
(1997) or Weeks (1985)). In this chapter we make a virtue of a necessity 
by confining our discussion mainly to the topology of surfaces. This is the 
only area that is sufficiently understandable and relevant when set against 
the background of the rest of this book. Fortunately, this area is also rich 
enough to illustrate some important topological ideas, while still being 
mathematically tractable and visual. We begin the discussion of surface 
topology at its historical starting point, the theory of polyhedra. 
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15.2 Polyhedron Formulas of Descartes and Euler 


The first topological property of polyhedra seems to have been discovered 
by Descartes around 1630. Descartes’s short paper on the subject is lost, 
but its contents are known from a copy made by Leibniz in 1676, discov- 
ered among Leibniz’s papers in 1860 and published in Prouhet (1860). A 
detailed study of this paper, including a translation and facsimile of the 
Leibniz manuscript, has been published by Federico (1982). 

The same property was rediscovered by Euler (1752), and it is now 
known as the Euler characteristic. If a polyhedron has V vertices, E edges, 
and F faces, then its Euler characteristic is V — E + F. Euler showed that 
this quantity has certain invariance by showing 

V-E+F=2 
for all convex polyhedra, a result now known as the Euler polyhedron for- 
mula. Descartes already had the same result implicitly in the pair of for- 
mulas 
P=2F+2V-4, P=2E, 

where P is the number of what Descartes called “plane angles”: corners 
of faces determined by pairs of adjacent edges. The relation P = 2F then 
follows from the observation that each edge participates in two corners. It 
should be stressed that Descartes’s “plane angle” has nothing to do with 
angle measure, and hence is just as topological a concept as Euler’s “edges.’ 
Thus Descartes’s result belongs to topology just as much as Euler’s does, 
even though it fails to isolate the concept of Euler characteristic quite as 
well. Some rather hairsplitting distinctions have been made between Euler 
and Descartes in an effort to show that Euler invented topology and 
Descartes did not (see Federico (1982) for a review of different opinions). 

Actually, neither of these mathematicians understood the polyhedron 
formula in a fully topological way. They both used nontopological con- 
cepts, such as angle measure, in their proofs, and they did not realize that 
“vertices,” “edges,” and “faces” are meaningful on any surface: edges need 
not be straight and faces need not be flat. Other early proofs of the Euler 
polyhedron formula also rely on angle measure and other ordinary geo- 
metric quantities. For example, that of Legendre (1794) assumes that the 
polyhedron can be projected onto the sphere, then uses the Harriot relation 
between angular excess and area for spherical polygons (Exercises 13.6.5, 
15.2.1, and 15.2.2). 


r) 
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Probably the first to understand V — E + F purely topologically was 
Poincaré (1895). In fact, Poincaré generalized the Euler characteristic to n- 
dimensional figures, but in the case of polyhedra his essential observation 
was this: a vertex divides an edge into two edges, and an edge divides a 
face into two faces. It follows that any subdivision of edges or faces of a 
polyhedron leaves V — E + F unchanged: if a new vertex is introduced on 
an edge, V and E both increase by 1; if a new edge is introduced across a 
face, E and F both increase by 1. The reverse processes of amalgamation, 
where they make sense, likewise leave V — E + F unchanged. 

The constancy of V — E + F over, say, the class of convex polyhedra 
then follows if it can be shown that any polyhedron P, in the class can be 
converted to any other, P2, by subdivisions and amalgamations. A plau- 
sible argument for this, due to Riemann (1851), is to view P; and P as 
subdivisions of the same surface, say a sphere. Assuming that the edges of 
P, and P2 meet only finitely often, superimposing P; on P2 gives a com- 
mon subdivision P; whose V — E + F value is therefore the same as that 
of P; and Py. Hence the V — E + F values of P; and P2 are equal. A more 
general approach, which also yields the value of V-E+F for nonspherical 
surfaces, is explained in the next section. 

An engaging recent account of the Euler characteristic and its history 
is Richeson (2008). 


EXERCISES 
Here is the proof of the Euler polyhedron formula by Legendre (1794). 


15.2.1 Consider the projection of a convex polyhedron onto a sphere, whose faces 
are therefore spherical polygons. Use the fact that 


area of a spherical n-gon = angle sum — (n — 2)z 
to conclude that 


total area = 47 = | all angles) - n(>! all n) + 2nF. 
15.2.2 Show also that 


> all n = 2E, y all angles = 2z7V, 


whence 
V-E+F=2. 


The invariance of the Euler characteristic gives a simple topological proof that 
there are only five regular polyhedra. In fact, it shows that only five polyhedra 
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are topologically regular in the following sense: for some m,n > 2 their “faces” 
are topological m-gons on a topological sphere, n of which meet at each vertex. 
We show as follows that V — E + F = 2 allows only the pairs 


(m,n) = (3, 3), (3,4), G, 5), 4, 3), G, 3), 
corresponding to the known regular polyhedra (Section 2.2). 
15.2.3. Given that there are F' faces, deduce that E = mF/2 and V = mF/n. 


15.2.4 Apply the formula V — E + F = 2 to conclude that 4n/(2m + 2n — mn) is 
a positive integer. 


15.2.5 Show that 2m +2n—mn > 0, that is, 2 + 2 > m, only for the above pairs 
(m,n). 


15.2.6 Also check that 2m + 2n— mn divides 4n for these pairs. 


15.3. The Classification of Surfaces 


Between the 1850s and the 1880s, several different lines of research led to 
the demand for a topological classification of surfaces. One line, descend- 
ing from Euler, was the classification of polyhedra. Another was the 
Riemann surface representation of algebraic curves, coming from Riemann 
(1851, 1857). Related to this was the problem of classifying symmetry 
groups of tessellations, considered by Poincaré (1882) and Klein (1882b) 
(see Section 15.4). Finally, there was the problem of classifying smooth 
closed surfaces in ordinary space (Mobius (1863)). These different lines 
of research converged when it was realized that each “surface” could be 
subdivided into “faces” by “edges” so as to become a generalized polyhe- 
dron. The generalized polyhedra were traditionally called closed surfaces, 
and are now described by topologists as compact and without boundary. 

The subdivision argument for the invariance of the Euler characteristic 
V — E + F applies to any such polyhedron, not just those homeomorphic 
to the sphere and not just those with straight edges and flat faces. Var- 
ious mathematicians, such as Riemann (1851) and Jordan (1866), came 
to the conclusion that any closed surface is determined, up to homeomor- 
phism, by its Euler characteristic. It also seemed that the different possible 
Euler characteristics were realized by the “normal form” surfaces seen in 
Figure 22.1, which were discovered by Mobius (1863). It is certainly plau- 
sible that these forms are distinct, topologically, because of their different 
numbers of “holes.” The main part of the proof is to show that any closed 
surface is homeomorphic to one of them. 
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The assumptions of Riemann (that the surface is a Riemann surface) 
and Mobius (that the surface is smoothly embedded in R*) were a little too 
special to yield a purely topological proof, and they also contained a hidden 
assumption of orientability (“two-sidedness”’). A rigorous proof, from an 
axiomatic definition of generalized polyhedron, was given by Dehn and 
Heegaard (1907). The closed orientable surfaces indeed turn out to be those 
pictured in Figure 15.1, but in addition there are nonorientable surfaces, 
which are not homeomorphic to orientable surfaces. 


ect < 


Figure 15.1: Surfaces of genus 0, 1, 2, 3, ... 


A nonorientable surface may be defined as one that contains a Mébius 
band, a nonclosed surface discovered independently by Mobius and List- 
ing in 1858 (Figure 15.2). 


Figure 15.2: A Mobius band 


Closed nonorientable surfaces cannot occur as Riemann surfaces, nor 
can they lie in R* without crossing themselves; nevertheless, they include 
some important surfaces, such as the projective plane (Exercise 7.5.3). The 
nonorientable surfaces are also determined, up to homeomorphism, by the 
Euler characteristic. 

The Mobius forms of closed orientable surfaces were given standard 
polyhedral structures by Klein (1882b). These are “minimal” subdivisions 
with just one face and, except for the sphere, with just one vertex. When the 
Klein subdivision of a surface is cut along its edges, one obtains a funda- 
mental polygon, from which the surface may be reconstructed by pasting 
suitable edges. Figure 15.3 shows how to cut a torus, which can then be 
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flattened to a rectangle. (The process was shown in reverse in Figure 12.4.) 


Figure 15.3: Cutting a torus 


Figure 15.4 shows genus 2. The surface is cut open along a figure eight 
curve on the top, then further cut through each “handle.” The cut surface 
can then be spread out flat as an octagon whose eight corners are seen 
coming apart in the middle of the picture. 


Figure 15.4: Cutting a genus 2 surface 


It is often more convenient to work with the polygon rather than the 
surface or its polyhedral structure. For example, since Brahana (1921), 
most proofs of the classification theorem have used polygons rather than 
polyhedra, “cutting and pasting” them (instead of subdividing and amalga- 
mating) until Klein’s fundamental polygons are obtained. The fundamen- 
tal polygon gives a very easy calculation of the Euler characteristic y and 
Exercise 15.3.1 shows it to be related to the genus g (number of “holes’’) 
by 

NX =2-2¢. 
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EXERCISES 


15.3.1 Show that the standard polyhedron for a surface of genus g > 1 has V = 1, 
E = 29, F = 1, whence y = 2 — 2g. 


The standard polygon for the genus g surface has a boundary path of the form 
aybya,'by!arbraz'bz! ++: agbga,'b;', where successive letters denote successive 
edges and those with exponents —1 have oppositely directed arrows. Edges with 
the same letter are pasted together, with arrows matching. 


15.3.2 Each sequence a;bia,'b;' is called a handle. Justify this term by drawing 
the surface that results from pasting together the matching edges of the 
polygon bounded by ajbia,'b;'c. The result should be a “handle-shaped” 
surface with boundary curve c. 


Another fundamental polygon is the “2n-gon with opposite edges pasted 


together,” that is, the polygon with boundary of the form a,a---a,a ie sarl, 


1 


15.3.3. Show that for both n = 2 and n = 3 the surface obtained from the polygon 


+++ a,a;'a;'---a;,' is a torus. 


1 


15.3.4 Show that if n is even, the vertices of the polygon a) -++a,a,!a5! +++ a;, 


become a single vertex after pasting, and if n is odd they become two. 
Hence find the Euler characteristic of the surface for any n. 


15.4 Surfaces and Planes 


In Section 12.5 we noticed that an elliptic function maps a plane onto a 
torus. Such mappings are also interesting in the topological context, where 
they are called universal coverings. In general, a mapping y: S > S of a 
surface § onto a surface S is called a covering if it is a homeomorphism 
locally, that is, when restricted to sufficiently small pieces of 5. The map- 
ping of the plane onto the torus in Section 12.5 is a covering because it 
is a homeomorphism when restricted to any region smaller than a period 
parallelogram. 

Another example we already know is the mapping of the sphere onto 
the projective plane given by Klein (1874) (Section 7.5). This map sends 
each pair of antipodal points of the sphere to the same point of the pro- 
jective plane, and hence is a homeomorphism when restricted to any part 
of the sphere smaller than a hemisphere. Yet another is Beltrami’s (1868a) 
covering of the pseudosphere by a horocyclic sector (Section 13.7). Topo- 
logically, this covering is the same as the covering of a half-cylinder by a 
half-plane when the plane is “wrapped” around the cylinder. 
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All these coverings are universal in the sense that the covering surface 
5 (sphere or plane) can be covered only by S itself. 

Since the sphere is covered only by itself, the interesting coverings of 
orientable surfaces are those for genus > 1 (or Euler characteristic < 0). All 
of these surfaces can be covered by planes. Moreover, each nonorientable 
surface can be doubly covered by an orientable surface in the same way 
that the projective plane is covered by the sphere, so the main thing to 
understand is the covering of orientable surfaces of genus > 1 by planes. 

The basic idea is due to Schwarz, and it became generally known 
through a letter from Klein (1882a) to Poincaré. To construct the univer- 
sal covering of a surface S, take infinitely many copies of a fundamental 
polygon F for S and arrange them in the plane so that adjacent copies of 
F meet in the same way that F meets itself on S. For example, the torus T 
in Figure 15.5 has the rectangular fundamental polygon F' shown, which 
meets itself along the red and blue edges in T (where the arrows indicate 
that edges must agree in direction as well as color). 


Figure 15.5: From the torus to its fundamental polygon 


If instead we take infinitely many separate copies of F and join adjacent 
red and blue edges, then we obtain a plane 7’, tessellated as in Figure 15.6. 
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The universal covering T — T is then defined by mapping each copy of 
the rectangle F in 7 in the natural way onto the F in T. 


A» ZN A» » 
> > > 

A» » uN A» 
> > > 

uN » A» A» 
> > > 

A» ZN A» » 
> > > 


Figure 15.6: Tessellation of the torus cover 


The tessellation of Figure 15.6 can of course be realized by rectangles 
in the Euclidean plane. We can therefore impose a Euclidean geometry on 
the torus by defining the distance between (sufficiently close) points on the 
torus to be the Euclidean distance between appropriate preimage points in 
the plane. In particular, the “straight lines” (geodesics) on the torus are the 
images of straight lines in the Euclidean plane. The torus geometry is not 
quite the geometry of the plane, of course, since there are closed geodesics, 
such as the images of the line segments a and b. However, it is Euclidean 
when restricted to sufficiently small regions. For example, the angle sum 
of each triangle on the torus is z. 

For surfaces of genus >1—that is, of negative Euler characteristic— 
the angle sum 27 of the fundamental polygon predicts negative curvature, 
and hence the natural covering plane should be hyperbolic. This can also 
be seen from the combinatorial nature of the tessellation on the universal 
cover. For example, the fundamental polygon F of the surface S of genus 
2 is an octagon, as we saw in Figure 15.4. 

In the universal covering, eight of these octagons have to meet at each 
vertex, since the eight corners of the single F meet on S. Such a tessellation 
is impossible, by regular octagons, in the Euclidean plane, but it exists in 
the hyperbolic plane, as Figure 15.7 shows. 

In fact, this tessellation is obtained by amalgamating triangles in the 
Gauss tessellation (Figure 13.24). The tessellations for general genus >1 
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Figure 15.7: Tessellation of the genus-2 covering 


can similarly be realized geometrically in the hyperbolic plane, and they 
were among the hyperbolic tessellations considered by Poincaré (1882) 
and Klein (1882b). The distance function, hence the curvature and local 
geometry, can be transported from the covering plane to the surface as we 
did above for the torus. 


EXERCISES 


When surfaces of genus >1 are realized as surfaces of constant negative cur- 
vature, their genus can be read off from their area. 


15.4.1 Show that the fundamental polygon for an orientable surface of genus p 
is a 4p-gon with angle sum 27. 


15.4.2 Deduce that its Euler characteristic is proportional to its angular defect and 
hence to its area. 


15.4.3 Conclude, using Exercise 15.3.1, that the area determines the genus. 


294 15 Topology 


15.5 The Fundamental Group 


Another way to explore the meaning of the universal cover S$ is to use it 
to plot paths on the surface S. As a point P moves on S, each preimage P 
of P moves “above it” on §. This means in particular that as P crosses an 
edge of the fundamental polygon on S, P crosses from one polygon to its 
neighbor on §. So P will not necessarily return to its starting point, even 
when P does. In fact, the displacement of P measures the extent to which 
P winds around the surface S. Figure 15.8 shows an example. As P winds 
once around the torus from O, more or less in the direction of the red loop, 
P wanders from one end O“ to the other O” of a red segment on §. 


60 am 62) 


Figure 15.8: Plotting on the covering surface 


We say that closed paths p, p’ with initial point O on S “wind in the 
same way,” or are homotopic, if p can be deformed into p’ with O fixed 
and without leaving the surface. Now if the path p of P is deformed into 
p’, with O fixed, then the path f of P is deformed into a jf’ with the same 
initial and final points, O and O, as p. Hence each homotopy class cor- 
responds simply to a displacement of the universal cover § that moves O") 
to O”). The different preimages P will of course start at different preim- 
ages O' of O, but a single displacement of § moves them all to their final 
positions O). Moreover, the displacement moves the whole tessellation 
of § onto itself: it is a rigid motion of the tessellation. 

Thus from the topological notion of homotopic closed paths we arrive 
back in ordinary geometry. We also arrive at a group called the fundamen- 
tal group of S. Geometrically, it is the group of motions of § that map the 
tessellation onto itself (mapping each edge to an edge with the same color 
and direction). Topologically, it is the group of homotopy classes of closed 
paths, with a common initial point O, on S. The product of homotopy 
classes is defined by successive traversal of representative paths. 
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The fundamental group was first defined by Poincaré (1895). Poincaré 
defined it for much more general figures, whose universal covers are not so 
apparent, so he did not generally view the fundamental group as a covering 
motion group. However, Poincaré had already studied groups of motions of 
tessellations in his (1882), using linear fractional transformations. Recon- 
sidering these earlier results topologically in his (1904), he arrived at the 
interpretation above. It includes, as we saw in Section 14.7, a presentation 
of the group by generators and relations. This discovery was very influen- 
tial on the later work of Dehn (1912) and Nielsen (1927), which led ulti- 
mately to a recent surge of interest in hyperbolic geometry and geometric 
group theory. For some of these developments, see Serre (2003) and Clay 
and Margalit (2017). 

The more general notion of fundamental group in Poincaré (1895) has 
also been influential outside topology. It turns out, for example, that for any 
“reasonably described” figure ¥ it is possible to compute generators and 
defining relations for the fundamental group of ¥ . The defining relations of 
a fundamental group can be quite arbitrary (in fact, completely arbitrary, as 
was shown by Dehn (1910) and Seifert and Threlfall (1934), p. 180). So the 
question arises: can the properties of a group be determined from its defin- 
ing relations? One would like to know, for example, when two different 
sets of relations define the same group. The latter question was raised by 
Tietze (1908) in the first paper to follow up Poincaré’s work. Tietze made 
the remarkable conjecture—which could not even be precisely formulated 
at the time—that the problem is unsolvable. The isomorphism problem 
for groups, as it came to be known, was indeed shown to be unsolvable 
by Adyan (1957). Adyan’s result was based on the theory of algorithms, 
which will be outlined in Chapter 17. 

By combining Adyan’s result with some of Tietze (1908) and the result 
of Seifert and Threlfall mentioned above, Markov (1958) was able to show 
the unsolvability of the homeomorphism problem. This is the problem of 
deciding, given “reasonably described” figures F; and Fz, whether F, is 
homeomorphic to 2. The figures #; and F2 can in fact be taken to be 
4-dimensional “polyhedra.” (A complete proof of the unsolvability of the 
isomorphism problem and homeomorphism problem may be found in 
Stillwell (1993), and its history may be found in Stillwell (1982).) Thus 
Poincaré’s construction of the fundamental group led in the end to a quite 
unexpected conclusion: the basic problem of topology is unsolvable. 
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EXERCISES 


In the following exercises it will be helpful to view the fundamental group as 
the group of motions of the universal covering plane, diagrammed in the previous 
section. The diagram shows that any sequence of motions equal to the identity 
corresponds to a closed path of edges in the diagram. 


15.5.1 Explain why the fundamental group of the torus is generated by elements 
a and b with defining relation 


aba"'b"! = 1. 


15.5.2 Similarly, explain why the fundamental group of the surface of genus 2 is 
generated by elements aj, b), a2, b2 with defining relation 


aybya,'by!aybraz'bz' = 1. 


15.5.3 Show that the former group is commutative but the latter is not. 


® 


Check for 
updates 
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Commutative Algebra 


PREVIEW 


In modern algebra the first important concept to come to light was that 
of groups, as we have seen in Chapter 14. The distinctive feature of most 
groups, which sets them apart from traditional algebra, is noncommuta- 
tive multiplication. In contrast, the key concepts of modern commutative 
algebra—rings, fields, and vector spaces—came to light only later, per- 
haps for the simple reason that at first they did not look different from 
traditional algebra. 

Indeed, the concepts of ring and field are exemplified by the ancient 
concepts of integers and rational numbers. Their defining properties—the 
axioms for rings and fields—seem merely to encapsulate the common rules 
for calculation. It was noticed only in the 19th century that the ring and 
field properties are shared by systems quite different from the rational 
numbers, so experience with rational numbers can be used in other mathe- 
matical domains. 

However, the domains that share the basic rules of calculation with 
integers and rational numbers may differ in other respects, especially in 
the nature of primes, where the very useful property of unique prime fac- 
torization may be lost. This raises the problem of generalizing the concept 
of prime, and finding conditions under which unique prime factorization 
may be regained. 

This problem, uncovered by Kummer in 1844, spurred much of the 
development of commutative algebra in the 19th and early 20th centuries. 
It is this development—called algebraic number theory—that we follow 
in the later sections of this chapter. 
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16.1 Linear Algebra 


Linear algebra began with the problem of solving sets of linear equations 
in several unknowns, which was solved by Chinese mathematicians about 
2000 years ago by the method we now call Gaussian elimination. As men- 
tioned in Section 5.2, the Chinese had a tool called the “counting board” 
that was ideal for such calculations, since it displayed the coefficients of 
the system in a square array, which could be operated upon just as we 
operate on matrices. 

A harder problem is finding a formula that expresses the solution of a 
system 


Ay X1 + ap2X. +++ + AyyXy = D1 


A2{X1 + A22X2 +++ + AanXy = bo 


Ani X| + An2X2 + +++ + AnnXpn = bn 


as a function of the coefficients a;; and b;. The solution is given by the rule 


det A; 
xi = > 
det A 
where det A is the determinant of the matrix 
411 4j2 *** Ain 
a2, 422 *** QAdn 
A=|. . 
Ani Qn2 °*** Ann 


and A; is the matrix obtained from A by replacing its ith column by the 
column of values b;. 

This rule is commonly called Cramer’s rule because of its appearance 
in the book Cramer (1750). However, it was known earlier. In a remark- 
able instance of independent discovery, Leibniz and the Japanese math- 
ematician Seki discovered determinants around 1680, and independently 
developed their properties over the next few decades. Knobloch (2013) 
reveals the extent of Leibniz’s results on determinants, which were not 
published in his lifetime. Determinants also underlie the elimination pro- 
cess for polynomial equations, special cases of which were discovered by 
the Chinese, as we mentioned in Section 5.2. 
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Determinants were rediscovered several times, and they became the 
subject of a substantial theory in the 19th century. As late as 1960, deter- 
minants were considered important enough to be the subject of the four- 
volume history, Muir (1960). Their theory, well into the 20th century, com- 
pletely overshadowed what we now call “linear algebra’; namely, the the- 
ory of vector spaces. 

In this chapter we will assume (as we already have in some earlier 
chapters) that the reader knows the basic rules for calculating determi- 
nants. The only theoretical property of determinants we require is that lin- 
ear homogeneous equations have a nonzero solution only if their determi- 
nant is zero. (This follows, for example, from Cramer’s rule and the prop- 
erty that a determinant with a zero column is zero.) As for vector spaces, 
we will develop their theory from scratch, because it is a simple but good 
example of modern algebraic thinking. 


16.2 Vector Spaces 


Grassmann (1844) introduced a very general, and sophisticated, theory of 
vector spaces, with inner and outer products. Because the idea was so new 
and, alas, very poorly explained by Grassmann, it was not understood by 
his contemporaries and went virtually unnoticed. Three years later, in an 
essay competition on the subject of Leibniz’s ideas about symbolic geom- 
etry, Grassmann (1847) made a second attempt. This time he emphasized 
the inner product as an encapsulation of the Pythagorean theorem, which 
makes a vector space “Euclidean.” Although Grassmann’s essay won the 
prize, his ideas did not really catch on until Peano formalized the concept 
of vector space (with due credit to Grassmann) in the 1880s. 

The vector space axioms concern objects called vectors, denoted by 
u,v,w,... which can be added and multiplied by numbers a, b,c, ..., called 
scalars in this context. The vectors include a zero vector 0 and, for each 
vector u, its negative —u. Then the axioms are, first, axioms for addition: 


u+v=v+u, 
ut+(v+w) =(u+v)+w, 
u+0=u, 


u+(-u) =0. 
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Then the following axioms for multiplication by scalars: 


a(bu) = (ab)u, 
lu =u, 
atu + v) = au + av, 
(a+ b)u = au + bu. 
Since a, b,c,... are assumed to be real, these axioms of Peano are properly 
called axioms for a real vector space. 
Grassmann developed his theory with the aim of algebraically creating 
a form of geometry, like Euclid’s but without restriction to two or three 
dimensions. The concept of dimension of a vector space V arises from the 


concept of basis, which formalizes the idea of coordinates in V. A set of 
vectors i,,i2,...,4, form a basis of V if: 


e Each uw € V can be written in the form 
U = Ub, + Unig +---Upin for some uy, Up,...,Un € R, 
in which case we say that ij, i2,...,i, span V. 


e A vector of the form 


Ql, +. doin +--+ dyin, for ay,do,...,a, € R, 
equals 0 only if a; = az = ---: = a, = 0, in which case we say that 
i,,12,...,1, are linearly independent. 


It follows from these conditions that each vector u is uniquely expressible 
in the form uj, + uy, +--+ Uyty, SO Uy, U2,...,Un Serve as coordinates of 
u with respect to the basis i,,i2,...,i,. Grassmann proved that any two 
bases of the same vector space V (assumed to have finite basis) are of the 
same size, n, called the dimension of V. 

Any real vector space of dimension 7 is essentially the same as the 
space R” of ordered n-tuples u = (uj, U2,...,Un), Where U1, U2,...,Uy are 
real numbers called the components of u. In this realization, vectors are 
added to each other, and multiplied by numbers, componentwise: 


(U1, U2, ...,Un) + (U1, 02,...5Un) = (Uy + U1, U2 + U2,...,Un + Un); 


A{Uy,U2,...,Un) = (AUy, AU2,..., Uy). 
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The inner product u - v of vectors 
Uu = (Uj,U2,...,Un) and v= (Uj, 02,...,Un) 


is defined by 
UV = UV, + UgU2 + +++ + UpUy. 


The inner product captures the concept of /ength of a vector (given by the 
Pythagorean theorem, first for n = 2, then inductively for larger n) 


\u| = Vid tide 402, 


pene u-u=u,+u,+-*> + = (Ul. 

The inner product also captures the concept of angle because (less obvi- 
ony u-v = |ullv| cos 8, 

where @ is the angle between the lines from 0 to the points u and v respec- 
tively. In particular, these lines are perpendicular when u - v = 0. Because 
of this, many classical theorems about right angles have very slick proofs 
using the inner product (see exercises below). 

By the early 20th-century, Klein was ready to include a smattering of 
Grassmann’s ideas in the geometry volume, Klein (1909), of his Elemen- 
tary Mathematics From an Advanced Standpoint. However, by this time 
algebraists had already extended the concept of vector space in a different 
direction. They observed that the fundamental properties of vector spaces, 
such as basis and dimension, do not require the scalars a, b,c, ... to be real 
numbers. The same ideas apply as long as the scalars form a field. 


EXERCISES 


The invariance of basis size, which leads to the concept of dimension, was 
proved by Grassmann using the following lemma: ifn vectors u,,...,U, spana 
vector space V over a field F, thennon+1 vectors Uj,...,Un41 are independent. 
Supposing the contrary, the proof is by a process of exchanging vectors u; by vj, 
one at a time, until all the u; are replaced. (The lemma is often called the “Steinitz 
exchange lemma,” though it is actually due to Grassmann.) 


16.2.1 Suppose we have replaced m—1 of the u; by 0),..., Un—1, So that 01,... 0-1 
and the remaining wu; span V. In particular, 


Vm = QV] + +++ + Gy—1Vm—1 + terms bju; where the aj, bj € F. 


Deduce that some b, # 0, and hence that uw; can be replaced by v,, in the 
spanning set. 
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16.2.2 Conclude that v;,...,v, are also a spanning set, and show that this contra- 
dicts the linear independence of 01, ...,0j+1. 


A nice theorem that falls out of an inner product calculation is concurrence 
of altitudes of a triangle. An altitude is the line though a vertex of a triangle 
perpendicular to the opposite side. Figure 16.1 shows, in an example, that the 
three altitudes have a common point. To prove this in general we let vertices of 
the triangle be wu, v, w, and choose the zero vector 0 to lie at the intersection of the 
altitudes of u and v. 


Figure 16.1: Altitudes of a triangle 


16.2.3. Deduce from this choice of origin that uw - (w —v) = O ando- (u-—w) = 0. 


16.2.4 Deduce in turn that w-(v—w) = 0, so that the altitude through w also passes 
through 0. 


16.3 Fields 


A field is a collection of objects that are added, subtracted, multiplied, and 
divided according to the rules of traditional algebra. These rules are now 
known as the field axioms: 


at+b=b+a ab = ba (commutative laws) 
at+(b+c)=(at+b)+c a(bc) = (ab)c (associative laws) 
a+(-a)=0 a-a'=1fora#0 (inverse laws) 
a+0=0 a-l=a (identity laws) 


a(b +c) =ab+ac (distributive law) 
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Thus it could be said that fields were the subject of all algebra up to the 
19th century. Up to that time, the “laws of algebra” went without saying, 
as did the subject of those laws. 

In retrospect, we can say that the pre-modern algebraists actually worked 
with several different fields: 


e In basic arithmetic, the field Q of rational numbers. 


e In more sophisticated arithmetic, where roots and logarithms are 
present, the field R of real numbers (though real numbers were not 
yet precisely defined). 


e In solving polynomial equations, the field C of complex numbers. 


e Inthe “Universal Arithmetic” of symbolic calculation in “unknowns” 
x,y,z... one had fields of rational functions in several variables. 


The concept of field gradually came to light in the 19th century, when 
several new, and radically different, fields came to light: 


e Finite fields, discovered by Galois in the 1820s. They include the 
field F,,, for each prime p, of congruence classes of Z mod p. 


e The algebraic number fields of Dedekind and Kronecker, which are 
subfields of C consisting of algebraic numbers. 


In particular, Dedekind (1871) viewed fields of algebraic numbers as vec- 
tor spaces over Q, and singled out those of finite dimension. Kronecker 
went so far as to claim that an algebraic number is properly realized by a 
field, and that existence of these fields is the proper fundamental theorem 
of algebra, as we will see in Section 16.6. 

The existence of the finite fields F, is an easy consequence of the 
Euclidean algorithm in Z, which was touched on in Section 14.1. We 
review the argument here because it is the prototype for the construction 
of algebraic number fields, which we come back to in Section 16.6. The 
path from Z, and a prime p, to the field F,, goes as follows. 


1. The members of F, are the classes [0], [1], [2],...,[p — 1] defined 
by congruence mod p: 


[a] = {n:n=a (mod p)} ={...,a@-p,d,a+p,at+2p,...}. 
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2. Congruence classes are added and multiplied by the rules 
[a] +[b] =[a+5], [alld] = [ad]. 


It must be checked that sum and product of congruence classes are 
well-defined; that is, they do not depend on the numbers a, b chosen 
to represent their congruence classes. But once this is done all the 
field properties, except the existence of inverses, follow easily from 
the corresponding properties of sum and product for integers. 


3. If [a] # [0] we find an inverse of the class [a] by the Euclidean algo- 
rithm. Since p is prime, gcd(a, p) = 1, and the Euclidean algorithm 
then gives integers m and n such that 


1 = gcd(a, p) = ma+np. 


4. In other words, ma = 1 (mod p), so 
[m][a] = [1], 


and hence [a] has the inverse [m]. 


Vector Spaces over a Field 


If F is any field the definition of a vector space over F is identical with 
the definition in Section 16.2, except that the scalars a, b,c,... now come 
from F. In the next two sections we will be particularly interested in vector 
spaces over the field Q of rational numbers. 

Q, as remarked above, is the field of basic arithmetic, so the most con- 
crete way to approach the irrational numbers arising from polynomial 
equations is to view them in relation to rational numbers where possi- 
ble. As we will see in the next section, this can be done for the num- 
bers a that satisfy polynomial equations with rational coefficients—the 
so-called algebraic numbers. In this case, a belongs to a vector space of 
finite dimension over Q. 


EXERCISES 


16.3.1 Find the inverses of [1], [2], [3], [4] mod 5. 


16.3.2 Explain why congruence classes mod 6 do not form a field under addition 
and multiplication of congruence classes. 
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16.4 Algebraic Numbers and Algebraic Integers 


An algebraic number may be defined as one that satisfies a polynomial 
equation p(x) = 0 with coefficients in Q. Without loss of generality we 
can assume this equation is of the form 


1 


Xx" + yx" +++ +ayx+d9 =0, where do,a),...,d,-1 EQ. (*) 


An algebraic number is said to be of degree n if it satisfies such a poly- 
nomial equation of degree n but not one of lower degree. Thus V2, for 
example, is of degree 2 because it satisfies the equation x? — 2 = 0 but not 
any equation of the form ax +b = 0 witha, b € Q, since the latter equation 
would imply that V2 is rational. 

An algebraic number a satisfies only one polynomial equation of min- 
imal degree and of the form (*). If there were two such equations their 
difference would be of lower degree, yet also satisfied by a. We call p(x) 
the minimal polynomial for a. The minimal polynomial p(x) is necessar- 
ily prime, or irreducible, because any factors of p(x) would give a lower- 
degree polynomial also satisfied by a. 

This leads, as we will see in Section 16.6, to a close analogy between 
integers modulo a prime p and polynomials modulo an irreducible p(x). 
In particular, the analogy gives the existence of polynomial inverses mod 
p(x), which clarifies the nature of inverses among the algebraic numbers. 


Generating a Field from an Algebraic Number 


Each algebraic number a gives rise to a field Q(@), which can be viewed 
as the smallest field containing Q and the number a. 

Q(q@) consists of all quotients g(@)/r(@), where qg and r are polynomials 
with coefficients in Q. It follows that the sum, difference, product, and 
quotient (with nonzero denominator) of any members of Q(q) is itself a 
member of Q(a). It is also clear that all members belong to C, which has 
the field properties, so Q(@) has them too. Thus Q(a) is a field, clearly 
containing a and all members of Q. Conversely, any number obtainable 
from a and members of Q by sums, differences, products, and quotients is 
a member of Q(q@). That is, any field containing a and Q contains Q(q@). In 
this sense, Q(q) is the “smallest” such field. 

In Section 16.6 we will see that Q(@) may also be fruitfully viewed as 
a vector space. In particular, Q( v2) is a vector space over Q with basis 
elements | and v2, as can be checked in the exercises below. 
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It is not obvious, at this stage, whether all members of Q(q) are alge- 
braic numbers. Indeed it is not obvious whether a + f is an algebraic num- 
ber when a and f are. The same question can be asked of the algebraic 
integers to which we now turn. 


Algebraic Integers 


In analogy with the definition of algebraic number, we define an algebraic 
integer to be a solution of an equation of the form (called monic because 
the leading coefficient is 1) 


1 


x" + Ap x” +++ +a,x+a9 =0, where ag,dj,...,4d,-1 EZ. (**) 


This definition includes some numbers that look like integers, such as the 
Gaussian integers a + b V—1, where a,b € Z, but also some that do not, 
such as — = which is a solution of x? — 1 = 0. Nevertheless, it turns 
out that algebraic integers are the right counterpart of ordinary integers 
among the algebraic numbers. In particular, the algebraic integers among 
the rational numbers are the ordinary integers. The definition (**) was 
proposed by Dedekind (1871), in the light of extensive experience with 
algebraic numbers. Another piece of evidence that supports (**) ) is the 
result of Eisenstein (1850) that sums and products of numbers satisfying 
(**) ) are also numbers of this form. 

An interesting feature of Eisenstein’s result is the use of determinant 
theory, from the hard core of Leibniz-era linear algebra that modern linear 
algebra tries to avoid. For those familiar with determinants, the argument 
is outlined in the exercises below. The corresponding result about sum and 
product of algebraic numbers, as we will see in the next three sections, is 
obtainable by softer methods. 


EXERCISES 


16.4.1 Show that the sum, difference, and product of numbers of the form a+b v2 
are again of this form, and so is ae 
16.4.2 Deduce from Exercise 16.4.1 that Q( V2) = {a+ b V2: a,b € Q}. 


16.4.3 Prove that x = ¥2+ V3 is an algebraic integer by finding a suitable fourth- 
degree polynomial satisfied by x. 
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16.4.4 Suppose that x = r/s is a rational algebraic integer, so that 
r\? r\e-l r 
(<) + Ayn-1 (<) +-+++4,;-+d) =0 where do, q,...,ay,-1 € Z. 
s S Ss 


Deduce that r” is divisible by s. 
16.4.5 Assuming now that gcd(r, s) = 1 in Exercise 16.4.4, conclude that s = 1, 


and hence that x is an ordinary integer. 


We now prove that the sum of algebraic integers is an algebraic integer. Suppose 
a satisfies an equation ak + ay_ja! +--+ +aja+ao = 0, with ao, a1,..., ax) € Z. 
16.4.6 Observing that 


= -ay_;a*! —---- aya — ag, 


=) 
| 


aa —ay_\a* mero — aja’ — aga, 


Rg 
Il 


and so on, explain why every polynomial in a with integer coefficients is a 


linear combination of 1, a,a7,...,a*~! with coefficients in Z. 


Similarly, if 6 satisfies a monic polynomial equation of degree /, any polynomial in 
Bis a linear combination of 1,8,...,6'"! with integer coefficients. Consequently, 
any polynomial in a and B is a linear combination of terms a'‘B/, withO <i< 
k-1 and0< j < 1-1, with integer coefficients. 

Denoting the k/ products a'/B/ by w,,..., @g conclude that we can write each 
polynomial w in a and B (such as a + f or a) in the form 


W=NW, +--+ +Ngwy Where nj,...,Ngy € Z. (*) 


16.4.8 Deduce from (*) that w satisfies k/ equations in the kl unknowns w,, with 
integer coefficients: 


WW, = MW +-++ + Ny 


WW, = Nw, +++ + Nw 


kl kL 
WW] = n\ ow edad 2 ni wa 


16.4.9 Conclude that these equations have nonzero determinant, that is, 


n—W Ny Ny 
wr ar aA 
ny Ny — Ny 
det : =0 
(kl) (kl) (kD) 
ny Ny ied 


Explain why this shows that w is an algebraic integer. 
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16.5 Rings 


The ring axioms are modelled on the properties of the ordinary integers: 


at+b=b+a ab = ba (commutative laws) 
a+(b+c)=(at+b)+c abc) = (ab)c (associative laws) 
a+(-a)=0 (inverse law) 
a+0=0 a-l=a (identity laws) 

a(b +c) =ab+ac (distributive law) 


These ring axioms were formulated to capture the common proper- 
ties of ordinary integers and the algebraic integers defined in Section 16.4. 
Special cases of algebraic integers were first introduced by Euler and Gauss 
to solve problems about ordinary integers. 

For example, Euler (1770b) used “integers” of the form a + b V—2, for 
ordinary integers a and J, to find the ordinary integer solutions of 


y =x? +2. 


His idea was to factorize the right hand side as (x + V-2)(x - v2) and to 
argue that x+ V—2 and x— V—2 behave like relatively prime integers. Then, 
assuming that unique prime factorization holds among the “integers” a + 
b v-2, it follows that both x + V—2 and x — V—2 are cubes, and a simple 
calculation leads to the single positive solution x = 5, y = 3. (See the 
exercises below.) 

This spectacular extension of classical arithmetic reasoning to new 
kinds of “integer” prompts a broader definition of integers, and a study of 
the primes among them. The ring axioms capture the fundamental proper- 
ties of integers, but they do not imply unique prime factorization. Section 
16.8 discusses how to refine the ring concept so as to ensure unique prime 
factorization, but first we will deal with an important case where unique 
prime factorization holds: a polynomial ring over a field, F(x]. 


Polynomial Rings 


Polynomials have been studied since the invention of algebraic notation. 
As early as 1585 Stevin observed that they behave like integers in an 
important way: they enjoy “division with remainder” in the following sense. 
If a(x) and b(x) # 0 are polynomials then there is a “quotient” polynomial 
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q(x) and “remainder” polynomial r(x) such that 
a(x) = b(x)q(x) + r(x), 


and r(x) is “smaller” than b(x) in the sense of having lower degree (with 
the special case that 0 is taken to have lower degree than a nonzero con- 
stant). 

To see why division property holds, suppose that 


a(x) = Ayx" +++++a,x+aQ9, 


D(x) = byx”" +--+ + b,x + bo 


with m < n (otherwise a(x) itself can serve as r(x), with g(x) = 0). In this 
case a(x)— b(x)- pa is a polynomial a’(x) of degree n’ < n, because the 
subtraction removes the term a,x" in a(x). Then if m < n’ we can repeat 
the process, eventually obtaining a polynomial r(x) of degree < m. The 
various multipliers of b(x) used in this process add up to the quotient q(x). 

Notice that we use only addition, subtraction, multiplication, and divi- 
sion of coefficients, so the division property holds for polynomials with 
coefficients from any field F. These polynomials form a ring called F[x]. 

Now that we have the division property for F[x], a Euclidean algorithm 
follows, along with all its usual consequences: 


e Any polynomials a(x) and b(x) have a divisor gcd(a(x), b(x)) which 
is greatest in the sense that it is divisible by any other polynomial 
dividing both a(x) and b(x). 


e gcd(a(x), b(x)) = m(x)a(x)+n(x)b(x) for some polynomials m(x), n(x) 
in F[x]. 


e If p(x) is an irreducible polynomial, and a(x) does not divide p(x), 
then gcd(a(x), p(x)) = 1 (or any other nonzero member of F, since 
all of them divide 1). 


e (Prime divisor property) If p(x) is an irreducible polynomial that 
divides a(x)b(x), then p(x) divides a(x) or p(x) divides b(x). 


e (Unique prime factorization) Any polynomial in F[x] has a factor- 
ization into irreducibles, which is unique up to the order of factors 
and nonzero factors from F. 
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In particular, polynomials in Q[x] have factorization into irreducibles, which 
is unique up to the order of factors and nonzero rational factors. 


EXERCISES 


Determining whether a polynomial in Q[x] is irreducible is often a difficult 
problem, but in low-degree cases we can appeal to proofs that certain numbers are 
irrational. 


16.5.1 Prove that V2 is irrational and deduce that x? — 2 is irreducible in Q[x]. 


Here is part of Euler’s solution of y? = x? + 2, using the integers a + b V—2 
and assuming their unique prime factorization. 


16.5.2 Assuming x + V—2 = (a+b V—2)° for a,b € Z, equate real and imaginary 
parts to find the only positive integer solution of y> = x? + 2. 


Unique prime factorization in Z[ V—2] is proved, as in Z, by proving a division 
property—which yields a Euclidean algorithm, prime divisor property, and hence 
unique prime factorization. We illustrate the idea first with the Gaussian integers 
Z{[i], the smaller members of which are shown as dots in Figure 16.2. The figure 
also shows the multiples of 3 + i among them as black dots, and the particular 
Gaussian integer 5 + 37 as a gray dot. 


s) O O O ©) O O 


e 
(2 + i)(3 + i) 


O O O ® O O O O 
(+13 +i) 


@ O O O O O © O 
i(3 + 1) 5+ 3i 


3+1 


O @ O O O O O O 
0 


Figure 16.2: Multiples of 3 + i near 5 + 3 
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16.5.3 Explain why the multiples of 3 + i form an array of squares, like the array 
Zi] itself but magnified and rotated. 


16.5.4 Explain in general why the multiples u@ of a Gaussian integer 6 form an 
array of squares, each of side length |§|. 


16.5.5 Show also that any Gaussian integer a (such as 5 + 3i) lies at distance 
|a — yB| < |8| from the nearest multiple of 8. 


16.5.6 Deduce from Exercise 16.5.5 that, for any Gaussian integers a and § # 0, 
there are Gaussian integers jz, 0 with the division property: 


a@=pB+p, where |p| < fl. 


16.5.7 Show similarly that Z[ V—2] has the division property, and hence unique 
prime factorization. 


16.6 Fields as Vector Spaces 


Now suppose that a@ is an algebraic number with minimal polynomial 
P(X) = x" + dy_1x" |! +++» + a,x + ag of degree n. We notice that 


a” = -a,_;a" | —--»— aya + ag 
and, more generally, any higher power of @ is a linear combination of 
1,a,a?,...,a@""! with rational coefficients. 

So the set Q[a@] of all polynomials in @ with rational coefficients, which 
is clearly a vector space, in fact equals the set of rational combinations 
, which is a vector space of dimension n over Q. We 
have just seen that the elements 1, a, a’,...,a”! span Q[a@], and they are 
linearly independent because an equation 


of 1,a,a”,...,a”! 


b,a" | +---+bja+bo9 =0, with bo, by,...,Dn-1 € Q not all zero, 


contradicts the assumption that @ has degree n. 

Moreover, the vector space Q{a] is a field. It is clear that the sum, 
difference, and product of any two members of Q[a] also belongs to Q[a]. 
Thus it remains to show that the inverse 6"! of any B € Q{a] is also a 
member. We do this in the same way we found inverses mod p in the 
previous section, by considering the relation of congruence mod p(x) and 
appealing to the Euclidean algorithm for polynomials. 
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Congruence Modulo an Irreducible Polynomial 


Instead of the ring Z and a prime p € Z, we now take the ring Q[x] 

of polynomials with rational coefficients and an irreducible polynomial 

D(x) € Q[x]. We introduce the notion of congruence mod p(x) by saying 
a(x) = b(x) (mod p(x)) 


if p(x) divides a(x) — b(x). This gives congruence classes of polynomials, 
[a(x)], which can be added and multiplied by the rules 

[a(x)] + [b@)] = [ax + BQ) and [a(x)] - [@)] = Lax) - bOI. 
We verify, exactly as we did for congruence classes of numbers modulo a 
prime in Section 16.3, that this sum and product are well defined and have 
the ring properties. 

Finally, and again by the same argument as in Section 16.3, using the 
Euclidean algorithm, we find that each nonzero class [a(x)] has an inverse 
class [m(x)], in the sense that 


[m(x)][a(x)] = [1]. 

Thus the ring of congruence classes of polynomials in Q{x], modulo 
an irreducible p(x), is a field. We now return to the vector space Q[a] to 
claim that its members are essentially the same as congruence classes of 
polynomials in Q[x] mod p(x), so Qla] is also a field. 

In fact, we get a one-to-one correspondence between the congruence 
classes and elements of Q[a] by letting each [r(x)] correspond to r(q@). This 
correspondence is one-to-one because, for any polynomial f(x) € Q(x], 

P(x) divides t(x) © t(a) = 0. 

The direction => is clear because p(@) = 0. Conversely, suppose f(a) = 0 
and consider the result of dividing f(x) by p(x). By the division property, 
t(x) = g(x) p(x) + r(x), 
where r(x) has lower degree than p(x). Then, since ¢(@) and p(q) are 0, 
r(a) = 0 also. This contradicts the minimality of p(x), unless r(x) = 0, in 
which case p(x) divides f(x). It follows now that, for any u(x), v(x) € QLx], 
u(x) and v(x) are in the same class © p(x) divides u(x) — v(x) 

© u(a) — v(a) = 0 


© u(a) = v(a). 


so congruence classes mod p(x) correspond to values in Q[a]. 
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Revisiting the Fundamental Theorem of Algebra 


The proof above realizes the field Q[a@], where a is typically irrational or 
imaginary, by a concrete collection of rational objects; namely, polynomi- 
als with rational coefficients. For example, the field Q( v2) involving the 
irrational number V2 is, for all algebraic purposes, the same as the collec- 
tion of linear polynomials ax + b, where a,b € Q. These polynomials are 
added and multiplied in the usual way with the proviso that x? — 2 = 0, 
and V2 itself corresponds to the congruence class of x.! 

The general idea of replacing algebraic numbers, and the fields they 
generate, by congruence classes of rational polynomials, was proposed by 
Kronecker (1887). Kronecker was opposed to irrational numbers, to large 
infinite totalities like R and C, and especially to pure existence proofs, 
where objects were shown to exist without being constructed. For all these 
reasons he objected to the fundamental theorem of algebra. He believed 
that it should be replaced by what he called the “fundamental theorem of 
general arithmetic,” an instance of which is the realization of Q(@) by the 
field of congruence classes mod p(x) in Q[x]. 

This field, though infinite, can be constructed step by step using only 
rational numbers, and it contains a solution to the polynomial equation 
p(x) = 0, namely the equivalence class of x, mod p(x). Thus, if one prefers 
a fundamental theorem in which roots of polynomial equations are con- 
structed as simply as possible, congruence classes of rational polynomials 
are the way to go. For more on Kronecker’s view of the fundamental the- 
orem of algebra, see Edwards (2007). 


EXERCISES 


16.6.1 Prove that Q(2!) = {a + b2"/3 + c273 : a,b,c € Qh. 


16.7 Fields of Algebraic Numbers 


AS we mentioned in Section 16.4, it is not obvious that a+ is an algebraic 
number when a and £ are. One proof of this fact uses determinants, but 
a simpler proof follows from a general theorem about the dimension of 
vector spaces pointed out by Dedekind (1894). 


‘In a similar way the field of complex numbers is realized by real linear polynomials 
a+bx with the proviso that x7 +1 = 0. The latter example was actually proposed by Cauchy 
(1847) as a rigorous approach to complex numbers. 
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Dimension Theorem. For fields D CEC F, with E of dimensionm over 
D and F of dimension n over E, F has dimension mn over D. 

Proof. If e1,€2,...@m is a basis for E over D, each e € Ecan be written 
e=dje,;+d,e,+---+d,e, forsomed,,d),...,d,, €D. 
Likewise, if fi, fo,...f, is a basis for F over E, each f € F can be written 
f=efiteft---+ef, forsome e},e,,...,¢, €E. 


These equations imply any f € F can be written as a linear combination of 
the elements e;f; with coefficients d;; € D. 


Thus the mn elements e; f; span F over D. Also, they are linearly inde- 
pendent. Supposing 


O=dyeifitdperfret+-:>+dineifn 


+ dreofi + dre€2 fo +--+ + donerfn 


oT dimt@mfi + dm2€mJ2 ane sot AinnemSn 


it follows, since fi, fo,...,f, are linearly independent over E, that their 
coefficients are zero. That is 


0= die} + doe ap sites + dinj€m 


0= die} + dye Sea din2€m 


0 = dine) + done2 +--+ + dynem 


which in turn implies each d;; = 0, because e1, é2,..., @m are linearly inde- 
pendent over D. Oo 


To apply this theorem we suppose a is an algebraic number of degree 
m, SO Q[a] is a vector space of dimension m over Q. Now if is an alge- 
braic number of degree n then the vector space 


(Q[a@])[Z] = {polynomials in 6 with coefficients in Q{a]} 
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has dimension at most n over the field Q[a] because 6” is a linear combi- 
nation of 1,8, B, shins oe with rational coefficients, hence certainly with 
coefficients in Q[a]. The same applies to the higher powers £”*!, B"*”,..., 
by the argument used in Section 16.6. Thus the elements 1,8,67,...,6""! 
span (Q[a@])[6], which therefore has dimension < n as a vector space over 
Qla]. 

It follows, by the dimension theorem, that the dimension of (Q[a])[6] 
over Q is at most mn. Now a + f clearly belongs to (Q[a])[f], so it is 
algebraic, of degree < mn, by the following simple theorem. 


Field of finite dimension over Q. In a field of F dimension d over Q, 
each element is an algebraic number of degree < d. 


Proof. lf y € F, where F has dimension d over Q, the d + | elements 
ly, y, ets iY cannot be linearly independent. Hence there are rational 
numbers do, d),..., dq, not all zero, such that 


ag +ayyt+---t+aqy¥4 = 0. 
This equation shows that y is algebraic, of degree < d. Oo 
This argument has consequences both for the “small” fields Q(@) of 


Section 16.4, each generated by a single algebraic number, and the collec- 
tion of all algebraic numbers. 


Corollary 1 When a is an algebraic number, all members of Q(a@) are 
algebraic. 


Proof. If wis of degree d, then Q(@) equals the vector space Q[a] of dimen- 
sion d over Q by the argument in Section 16.6. Then each member of Q(a) 
is an algebraic number of degree < d by the theorem above. Oo 


Corollary 2 The set of all algebraic numbers is closed under the opera- 
tions +,—-, X, and ~ (by nonzero elements), and hence is a field. 


Proof. If a and 8 # 0 are algebraic numbers then not only a + £ but also 
a — B, af, and a/f belong to (Q{a])[6], which is of finite dimension over 
Q. Hence they are algebraic by the theorem above. Oo 


EXERCISES 


An algebraic problem originating in Euclid’s geometry is the problem of con- 
structible numbers. Geometrically speaking, a number is a constructible if the 
corresponding length is constructible from the unit length by ruler and compass. 


316 16 Commutative Algebra 


With the arithmetization of geometry by Descartes (1637), we saw in the exer- 
cises to Section 5.3 that an equivalent algebraic statement is that @ is obtainable 
from 1 by the rational operations +,—,x,+ and the ,/ operation. We can now 


revisit the question of whether v2 is constructible, first solved in the exercises to 
Section 5.4. 


16.7.1 Explain why an equivalent question is whether there are fields 
Q=K/ CF, c::-CF, 
with each F,,,; of dimension 2 over F,, and Ve Fy. 


16.7.2 Deduce from the dimension theorem that Q(q@) has dimension 2*, for some 
k > 0, when a is a constructible number, and that a has degree Qk 


16.7.3, Conclude, with the help of Exercise 16.5.1, that v2 is not constructible. 


16.8 Ideals 


AS we saw in the exercises to Section 16.4, if a@ and 6 are algebraic inte- 
gers, then so are a+ 8, a—, and af. Thus the set of all algebraic integers 
is closed under the operations of +,—, and x, and hence it is a ring. But 
it is not a good ring for doing arithmetic, because it has no “primes”: if a 
is an algebraic integer then so is Ya, hence every algebraic integer has a 
nontrivial factorization a = Va Va. 

Dedekind (1871) found that the right setting for arguments about alge- 
braic integers and primes, such as Euler’s solution of the equation y* = 
x* + 2 mentioned in Section 16.5, is in fields of finite dimension over Q. 
Euler’s solution uses the algebraic integers in the two-dimensional field 
Q[ V—2], which happen to be precisely the numbers of the form a+b V—2, 
where a,b € Z. 

In this example, one can prove that unique prime factorization holds 
in Q{ V—2] with the help of a measure of size for the integers a + b a) 
called their norm: 


norm(a + b V-2) = a? + 2b’ 


This norm is simply the square of the absolute value |a+b V—2], and hence 
it has the multiplicative property: 


norm(a@f) = norm(q@)norm(£). 


We call an algebraic integer of Q[ V—2] prime if it has norm greater than 
1 and is not the product of algebraic integers of smaller norm. The exis- 
tence of prime factorization then follows as in Z. Since norms are ordinary 
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positive integers, the process of splitting an integer into factors of smaller 
norm must terminate—since positive integers cannot decrease forever— 
necessarily in factors that are prime. 

The situation is the same in any algebraic number field of finite dimen- 
sion. Dedekind (1871) showed that each such field has a concept of norm, 
which is multiplicative and integer-valued for algebraic integers, so primes 
and prime factorizations exist. However, prime factorization is not always 
unique. Euler (1770b) was lucky to pick Q[ V—2], because it does indeed 
have unique prime factorization, as we showed in Exercise 16.5.7. On the 
other hand, Q[ V-5] does not. 

In Ql V-5] the integers are the numbers of the form a + b V—5, where 
a,b € Z, among which the integer 6 has the factorizations 


6=2-3=(1+ V-5)\(1 — V-5). 


The norm of a+b V—5 is a? + 5b”, and it can be checked that each of 2, 3, 
1+ V-5, 1 — V-5 is prime according to this norm, so the number 6 has 
two distinct prime factorizations. 

Failure of unique prime factorization among the algebraic integers was 
first noticed by Kummer in the 1840s, and he realized that it is a serious 
problem. He wrote: 


It is greatly to be lamented that this virtue of the real num- 
bers [that is, of the ordinary integers] to be decomposable 
into prime factors, always the same ones for a given number, 
does not also belong to the complex numbers [that is, the alge- 
braic integers]; were this the case, the whole theory, which is 
still laboring under such difficulties, could easily be brought 
to a conclusion. For this reason, the complex numbers we 
have been considering seem imperfect, and one may well ask 
whether one ought not to look for another kind which would 
preserve the analogy with the real numbers with respect to 
such a fundamental property. 


Translation by Weil (1975) from Kummer (1844) 


Dedekind (1877) put himself in Kummer’s shoes as he described this 
turning point in the history of algebra: 


But the more hopeless one feels about the prospects of later 
research on such numerical domains, the more one has to 
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admire the steadfast efforts of Kummer, which were finally 
rewarded by a truly great and fruitful discovery. 


Dedekind (1877), p. 56. 


Kummer found “another kind” of number that overcame the failure of 
unique prime factorization, and he called them ideal numbers, though he 
did not properly define them. Today we know them under the concept of 
ideals, introduced by Dedekind (1871) to formalize Kummer’s idea, and 
to generalize it to all rings of algebraic integers in algebraic number fields 
of finite dimension. The idea, roughly speaking, is that a number may be 
known by its set of multiples. Dedekind realized that a set J of multiples 
in aring R has two key properties: 


e lIfa,Belthna+fel. 


e If@welandpeRthenpee /. 


He made these the defining properties of an ideal / in a ring R. 

Ideals first showed their fruitfulness in number theory, where they 
allowed algebraic integers to be used freely while preserving the analogy 
with ordinary integers. But ideals soon found other applications, starting 
with their use by Dedekind and Weber (1882) in algebraic geometry, where 
they are applied to fields of algebraic functions. Today, ideals are such a 
fundamental part of ring theory that algebra books often introduce them 
without explaining that the word “ideal” came from “ideal numbers.” 


EXERCISES 


16.8.1 Show that {4m : m € Z} and {6n : n € Z} are ideals in Z. 
16.8.2 Show also that {4m + 6n : m,n € Z} is an ideal, which equals {2k : k € Z}. 


16.9 Ideal Prime Factorization 


To see how ideals of algebraic integers might preserve the analogy with 
ordinary integers, we begin by rewriting the theory of divisibility and gcd 
in Z in terms of ideals. This suggests appropriate definitions of divisor and 
gcd for ideals of algebraic integers, and leads to the discovery that the two 


6 =2-3=(1+ V—5)\(1 — V-5) 


arise from a single ideal prime factorization, as Kummer hoped. 


factorizations 
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Ideals in Z 
In Z we have the commonplace facts that 
2 divides 6, 3divides6, gced(2,3) = 1. 
These facts can be rewritten in terms of the sets 
(2) = {multiples of 2}, (3) = {multiples of 3}, (6) = {multiples of 6}, 
which are examples of ideals. The equivalents of the first two facts are 
(2) contains (6), (3) contains (6), 


which may be summed up by the slogan to divide is to contain. To express 
the third fact we consider another ideal, the sum of (2) and (3): 


(2) + (3) ={a+b: ae (2),b€ (3)}. 


It is clear that gcd(2, 3) divides any member of the set (2) + (3), and in fact 
it is not hard to show that 


(2) + (3) = {multiples of 1} = (1) = (gcd(2, 3)). 


In general, for any a € Z, the set (a) = {multiples of a} is obviously an 
ideal, called the principal ideal generated by a. It is not hard to prove (see 
the exercises below) that 


e every ideal in Z is (a) for some a, 
e a divides b © (a) contains (5), 
e (a) + (b) = (gced(a, b)). 


Since ideals in Z correspond to numbers in Z, the language of ideals tells 
us nothing new about Z. However, the concept of ideal gives us new insight 
into the ring Z[ V—5] = {a + b V-5 : a,b € Z}, where not every ideal is 
principal. 
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Ideals in Z[ V—5] 


One such ideal is the sum J of the principal ideals (2) and (1 + v-5), 
{Qu + (1 + V—5)v: u,v € Z[ V—5]}, which happens to equal 

{2m + (1+ v-5)n :m,n € Zh. 
We expect (by analogy with Z) this ideal to be the gcd of the principal 
ideals (2) and (1 + v-5). In Kummer’s terms it is the set of multiples of 
the “ideal number” gcd(2, 1 + V—5). It can be seen from a picture of part 
of J (the black dots in Figure 16.3) that / is not a principal ideal. 


@® 0 @ 0 @ 0 @ 0 @ 0 8@ CO @ 


j=) 
= 


Oo 8 0 @ 0 ®@ 0 @ 0 @ O @ 0 
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Figure 16.3: Multiples of ged(2, 1 + Y—5) 


This is because a principal ideal (@) in Z[ V—5] is simply Z[ V—5] mul- 
tiplied by a, so it looks like the rectangular array Z[ V—5], only magnified 
by |a| and rotated by the argument of a. In particular, a principal ideal is 
a rectangular array. But it is clear from Figure 16.3 that the black dots do 
not form rectangles. 


It can be seen similarly that the ideals 


(3)+(1+ V-5)=(ged(3, 1 + V-5)) and (3)+d- V-5) = (ged(3, 1 - V-5)) 


are not principal ideals. But these ideals are plausible ideal factors of the 
numbers 2, 3, 1 + V—5, 1 — V—5 found in the two factorizations of 6. We 
have only to explain what it means to multiply ideals. 
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Product of Ideals 


Dedekind (1871) defined the product of ideals A, B to consist of all finite 
sums of products a;b;, where a; € A and b; € B. Thus 


AB = {a,b, +--+ + yb: a1,...,d, € A Dy,..., Dn € Bh. 


This concept of product agrees with the idea that “to divide is to contain” 
because each a;b; € A and hence a,b, + --- + a,b, € A, so A D> AB and 
therefore A divides AB. Similarly, B divides AB. 

The ideals 


A=(2)+(1+ V-5), B=(3)+(1+ V-5), B=(3)+(1- V-5), 
provide interesting examples of products, namely 
A? =(2), AB=(1+ V-5), AB=(1- V-5), BB = (3), 
which may be verified in the exercises below. It follows that the two fac- 


torizations 
6=2-3, 6=(1+ V—5)(1— V-5) 


can finally be reconciled—by splitting them further into the single factor- 
ization of ideals 
(6) = ABAB. 


Also, the ideals A, B, B are prime because they are maximal; that is, 
each is properly contained only in the ideal Z[ V-5] itself. The reason for 
considering maximal ideals to be prime is that a prime ideal P is defined 
as an ideal, not equal to the whole ring, with the prime divisor property: 


If P divides AB then P divides A or P divides B. 
Since “to divide is to contain,” an equivalent statement of this property is: 
If PD AB thenP2A or P2B. 


It is easily checked, using the above definition of the product AB, that a 
maximal ideal P satisfies this condition. 
In Z[ V—5] it is quite easy to show that A, B, B are all maximal, so 


(6) = ABAB 


is a prime ideal factorization. Moreover, it can be shown to be unique. 
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Dedekind (1871) showed unique prime ideal factorization holds not 
only for the number 6 in Z[ V—5] but for any algebraic integer in the ring 
of integers of a field of finite dimension over Q. 

Dedekind’s breakthrough inspired Emmy Noether in the 1920s to 
develop a general theory of rings and ideals, which became the foundation 
of modern research in algebraic number theory and algebraic geometry. 
Noether (1926) was able to describe precisely which rings admit unique 
prime ideal factorization. Today they are called Dedekind rings. 


EXERCISES 
First, let us check two properties claimed above for ideals in Z. 


16.9.1 If J © Z is a nonzero ideal, use the division property in Z to prove that 
I = (a), where a is the smallest positive member of J. 


16.9.2 Deduce from 16.9.1 that (a) + (b) = (gcd(a, b)). 
Now we turn our attention to ideals in Z[ ¥—5]. 


16.9.3, Check that 


(2) + (1+ V—5) = (Qu+(1+ V—5)v: wv € Z[ V—5]} 
= {2m+(1+ V-5)n:m,né Z}. 


16.9.4 Letting A = {2m + (1 + V—5)n: m,n € Z}, verify that each member of A? 
is a multiple of 2. 


16.9.5 Show in turn that 1 — V—5 € A, 6 € A”, and 4 € A”. Deduce that 2 € A” 
and hence that A? = (2). 


16.9.6 Letting B = (3) + (1+ V-5) and B = (3) + (1 — V-5), show similarly that 
AB = (1+ V-5), AB = (1— V-5), and BB = (3). 


® 


Check for 
updates 
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Sets, Logic, and Computation 


PREVIEW 


In the 19th century, perennial concerns about the role of infinity in 
mathematics were finally addressed by the development of set theory and 
formal logic. Set theory was proposed as a mathematical theory of infinity 
and formal logic was proposed as a mathematical theory of proof (partly 
to avoid the paradoxes that seem to arise when reasoning about infinity). 

In this chapter we discuss these two developments, whose interaction 
led to mind-bending consequences in the 20th century. Both set theory and 
logic throw completely new light on the question, “What is mathematics?” 
But they turn out to be double-edged swords. 


e Set theory brings remarkable clarity to the concept of infinity, but it 
shows infinity to be unexpectedly complicated—in fact, more com- 
plicated than set theory itself can describe. 


e Formal logic encompasses all known methods of proof, but at the 
same time it shows these methods to be incomplete. In particular, 
any reasonably strong system of logic cannot prove its own consis- 
tency. 


e Formal logic is the origin of the concept of computability, which 
gives a rigorous definition of an algorithmically solvable problem. 
However, some important problems turn out to be unsolvable. 


© Springer Nature Switzerland AG 2020 323 
J. Stillwell, Mathematics and Its History, Undergraduate Texts in Mathematics, 
https://doi.org/10.1007/978-3-030-55 193-3_17 


324 17 Sets, Logic, and Computation 


17.1 Sets 


Sets became part of mathematics in the late 19th century through attempts 
to understand the real numbers. Our intuition of the real numbers—that 
they form a line without gaps—is a mystery that mathematicians have 
struggled to explain since ancient times. It underlies the concept of motion 
that Zeno tried to challenge with his paradoxes; it resurfaced with calcu- 
lus in the 17th century; and it intruded into algebra when Gauss used the 
intermediate value theorem in his 1816 proof of the fundamental theorem 
of algebra. As we mentioned in Section 11.4, Bolzano (1817) realized that 
the intermediate value theorem demands a proof, but he did not have a 
concept of real number on which a proof could be soundly based. 

Bolzano did, however, realize the need for a completeness property of 
R that expresses the absence of gaps. He identified the least upper bound 
property, that every bounded set of real numbers has a least upper bound, 
and the equivalent nested interval property, that if 


ay <a, <a <-+:+ <b <b; < bo 
then there is a number x such that 


ay <a, <a, <-++<SxXS-++<bn <b, < do. 


To prove such properties, we have to answer the question, what is a real 
number? Several equivalent answers were given around 1870, all involving 
infinite sets or sequences. The simplest was that of Dedekind (1872), who 
defined a real number to be a partition (or cut) of the rational numbers into 
two sets, L and U, such that each member of L is less than all members of 
U. If one has a preconceived notion of real number, such as a point x on 
a line, then Z and U are uniquely determined by x as the sets of rational 
points to left and right of it, respectively. Thus if x is preconceived, then L 
and U are no more than auxiliary concepts that enable x to be handled in 
terms of rationals, as Eudoxus did (Section 4.2). Dedekind’s breakthrough 
was to realize that no preconceived x is necessary: x is defined by the pair 
(L, U). Thus the concept of sets of rationals became a basis for the concept 
of real number. 

Dedekind cuts give a precise model for the continuous number line R, 
since they fill all the gaps in the rationals. Indeed, wherever there is a gap 
in the rationals, the object that fills it is essentially the gap itself: the pair 
of sets L, U to left and right of it. Other formulations of this completeness 
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property of R are also easy consequences of Dedekind’s definition. For 
example, each bounded set of reals (L;, U;) has a least upper bound (ZL, U): 
L is simply the union of the sets Lj. 

Dedekind seemed to have settled the ancient problem of explaining 
the continuous in terms of the discrete, but in penetrating as far as he 
did, he also uncovered deeper problems. The central problem is that the 
completeness of R entails its uncountability, a phenomenon discovered 
by Cantor (1874). The countable sets are those that can be put in one- 
to-one correspondence with N = {0,1,2,...}. They include the set of 
rationals and also the set of algebraic numbers, as Cantor learned from 
Dedekind. But if R is countable, this means that all reals can be included 
in a sequence Xo, x1, X2,... . Cantor (1874) showed that this is impossi- 
ble by selecting from each sequence {x,,} of distinct reals a subsequence 
ao, bo, a1, D1, dz, bo, .. ., Such that 


ay < a, <a. <-:: <b < by < bo 


and with each x,, outside one of the nested intervals (ao, bo) D (a1, b1) D 
(a2, bz) D ---. It follows that any common element of all the (a,, by) is a 
real x # each x,,. A common element obviously exists if the sequence of 
intervals is finite, and if the sequence is infinite, it exists by completeness, 
as the least upper bound of the a,. The common element x is a “gap” in 
the given sequence {x,,}. 

This method, though ingenious, is by no means the easiest way to 
prove that R is uncountable. In Section 17.5 we will see a simpler method 
that Cantor discovered later. Another simple method, using the concept of 
measure, is in Section 17.3. 


EXERCISES 


Cantor’s 1874 proof of the uncountability of R is based on the following con- 
struction. Given a sequence Xo, x1, X2,... of distinct reals, he found a gap in them 
by picking out ao, bo, a1, bi, ... as follows: 

do = Xo, 

bo = first xX with ag < Xm, 

a, = first x after bo with dg < Xm < bo, 
b, = first x», after ay with ay < x» < bo, 


dy = first x,, after b; with a; < x, < by. 
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17.1.1 Explain why the sequence do, bo, a), b1, dz, b2,... has the gap property 
described above: each x,, is outside one of the nested intervals (dao, bo) D 
(a1, b1) D (a2,b2) D+. 


We now explore how far we can enlarge the set of natural numbers and still 
have a countable set. 


17.1.2 Give a rule for continuing the sequence 


1 2 1 3. 2 1 4 3 
oe 2S 7" oF oe a. a 
so as to include all positive rationals. 
17.1.3, How can one then conclude that the set of all rationals is countable? 


17.1.4 The words on a fixed finite alphabet can be enumerated by listing first the 
one-letter words, then the two-letter words, and so on. Use this observation 
to show that the set of polynomial equations with integer coefficients is 
countable and hence that the set of algebraic numbers is countable. 


Cantor used the latter result to prove the existence of transcendental numbers. 
Namely, let {x,,} be the sequence of algebraic numbers; we know that these are 
not all the real numbers, so any other real number is transcendental. 


17.2. Ordinals 


The uncountability of R has been a great challenge to set theorists and logi- 
cians ever since its discovery. The most successful response to this chal- 
lenge has been the theory of ordinal numbers. This grew out of Cantor’s 
(1872) investigation of trigonometric series, which leads to the problem 
of analyzing the complexity of point sets. Cantor measured complexity by 
the number of iterations of the prime operation (’) of taking the limit points 
of a set. For example, if S = {0,1/2,3/4,7/8,..., 1}, then the prime oper- 
ation can be applied once, and S’ = {1}. It can happen that S’ itself has 
limit points, so that S” also exists. In fact, one can find a set S for which 
S',8”,...,5,... exist for all finite n, so one can envisage iterating the 
prime operation an infinite number of times. In the case where all the 5” 
exist, Cantor (1880) took their intersection, thereby defining 


S? = Ayp133,,.5™. 


He viewed co as the first infinite ordinal number. To avoid confusion with 
higher infinite numbers soon to appear, I will use the modern notation w 
for the first infinite ordinal. 
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Pd 
Having made the leap to w, it is easy to go further: (s @)) Se 
y 
(s oe) = §+?), __., and the intersection of this new infinite sequence is 


S®?, where w-2 is the first infinite number after w, w+ 1, w+2,.... After 
w+ 2, one has 


w:24+1, w:-24+2, ..., w-3,..., W°4, ..., ..., WW, .... 


All these ordinal numbers can actually be realized as numbers of iterations 
of the prime operation on sets of reals. We can also investigate the ordinal 
numbers independently of this realization, as an extension of the concept 
of natural number. 

Cantor (1883) viewed the ordinals as the result of two operations: 


(i) Successor, which for each ordinal a gives the next ordinal, a + 1. 


(ii) Least upper bound, which for each set {a@;} of ordinals gives the least 
ordinal > each q;. 


The most elegant formalization of these notions was given by von Neu- 
mann (1923). The empty set @ (not considered by Cantor) is taken to be 
the ordinal 0, the successor of a@ is @ U {a}, and the least upper bound of 
{a;} is simply the union of the a;. Thus 


0=90, 
1 = {0}, 
2 = {0, 1}, 


w = {0,1,2,...,n,...}, 
w+1= {0,1,2,...,n,..., a}, 


and so on. The natural ordering of the ordinals is then given by set mem- 
bership, €, and, in particular, the members of an ordinal a are all ordinals 
smaller than a. 

Cantor’s principle (ii) generates ordinals of breathtaking size, since it 
gives the power to transcend any set of ordinals already defined. In partic- 
ular, an ordinal of uncountable size is on the horizon as soon as one thinks 
of the concept of countable ordinal, as Cantor did (1883). He defined an 
ordinal @ to be countable (or, as he later put it, of cardinality or cardinal 
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number No) if a could be put in one-to-one correspondence with N. For 
example, 
w:2={0,1,2...,w.0+1,w+2,...} 
is countable because of its obvious correspondence with 
N = {0, 2,4,...,1,3,5,...}. 


The least upper bound of the countable ordinals is the least uncountable 
ordinal, w;. Sets in one-to-one correspondence with w, are of the next 
cardinality, &;. Ordinals of cardinality 8; have a least upper bound w» 
of cardinality X2, and so on. (& is aleph, the first letter of the Hebrew 
alphabet.) 

Having found this orderly way of generating successive uncountable 
cardinals, Cantor reconsidered the uncountable set R. Although no method 
of generating members of R in the manner of ordinals was apparent, Can- 
tor conjectured that the cardinality of R was &,. This conjecture has since 
become known as the continuum hypothesis. By 1900 it was recognized as 
the outstanding open problem of set theory, and Hilbert (1900a) made it 
number one on the famous list of problems he presented to the mathemati- 
cal community. There have been two outstanding results on the continuum 
problem since 1900, but together they seem to make it harder to know 
whether the continuum hypothesis is true. Gédel (1938) showed that the 
continuum hypothesis is consistent with standard axioms for set theory, 
but Cohen (1963) showed that its negation is also consistent. Thus the 
continuum hypothesis is independent of standard set theory, in the same 
way that the parallel postulate is independent of Euclid’s other postulates. 
Whether this means that the notion of “set” is open to different natural 
interpretations, like the notion of “straight line,” is not yet clear. 


EXERCISES 


For each countable ordinal a there is a set of rationals in [0,1] with order type 
a. For example, the set {0, 1/2,3/4, 7/8, ...} has order type w. 


17.2.1 Give an example of a set of rationals in [0,1] with order type w - 2. 
17.2.2 Give an example of a set of rationals in [0,1] with order type w - w. 


17.2.3 Given sets of rationals in [0,1] with order types a1, a2, @3,..., explain how 
to obtain a set of rationals in [0,1] with order type at least as large as the 
least upper bound of {a@1, @2, @3,...}. 


17.2.4 Explain why there is a set of rationals in [0,1], with order type a, for each 
countable ordinal a. 
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Cantor’s reason for investigating sets of discontinuities in the theory of 
trigonometric series goes back to the discovery of Fourier (1822) that these 
series depend on integrals. Assuming that 


1 foe} 
f@= 540 + » (a, cos nx + by sinnrx) , 


n=1 


Fourier derived the formulas 
1 1 
An = if f(x)cosnaxdx, by = { F(x) sinnax dx. 
-1 -1 


Thus the existence of the series depends on the existence of the inte- 
grals for a, and b,, and this in turn depends on how discontinuous f is. It 
was known (though not rigorously proved) that every continuous function 
has an integral, so the next question was how the integral should, or could, 
be defined for discontinuous functions. The first precise answer was the 
Riemann (1854a) integral concept, familiar to all calculus students, and 
based on approximating the integrand by step functions. Any bounded 
function with a finite number of discontinuities has a Riemann integral, 
and indeed so have certain functions with infinitely many discontinuities, 
but not all. The classic function for which the Riemann integral does not 
exist is the function of Dirichlet (1829): 


Apis 1 if xis rational, 
“'* 1) 0. if xis irrational. 


Eventually a more general integral, the Lebesgue integral, was intro- 
duced to cope with such functions, but not until the focus of attention had 
shifted from the problem of integration to the more fundamental problem 
of measure. Measure generalizes the concept of length (on the line R), area 
(in the plane R*), and so on, to quite general point sets. Since an integral 
can be viewed as the area under a graph, its dependence on the concept of 
measure is clear, though it was not immediately realized that the measure 
of sets on the line had to be clarified first. 

The need for clarification arose from the discovery of Harnack (1885) 
that any countable subset {x9, x1, X2,...} of R could be covered by a col- 
lection of intervals of arbitrarily small total length. Namely, cover xo by 
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an interval of length ¢/2, x; by an interval of length ¢/4, x2 by an interval 
of length ¢/8, ..., so that the total length of intervals used is < e. (This is 
another proof, by the way, that R is not a countable set.) This seemed to 
show that countable sets were “small”—of measure zero, as we now say— 
but mathematicians were reluctant to say this of dense countable sets, like 
the rationals. The first response, by Jordan (1892), was to define measure 
analogously to the Riemann integral, using finite unions of intervals to 
approximate subsets of R. Under this definition, “sparse” countable sets 
like {0,1/2,3/4,7/8,...} did have measure zero, but dense sets like the 
rationals were not measurable at all. 


The first to take the hint from Harnack’s result that countable unions 
of intervals should be used to measure subsets of R was Borel (1898). 
He defined the measure of any interval to be its length, and he extended 
measurability to more and more complicated sets by complementation and 
countable disjoint unions. That is, if a set S contained in an interval J has 
measure p(S ), then 
yl - S) = w(1) - pS), 


and if S is a disjoint union of sets S,, with measures yu (S,,), then 
HS) =D W(Sn): 
n=1 


The sets that can be formed from intervals by complementation and count- 
able unions are now called Borel sets. Borel’s idea was pushed to its log- 
ical conclusion by Lebesgue (1902), who assigned measure zero to any 
subset of a Borel set of measure zero. Since not all such sets are Borel, 
this extended measurability to a larger class of sets: those that differ from 
Borel’s by sets of measure zero. It can be proved that the class of Lebesgue 
measurable sets has the same cardinality as the class of all subsets of R. 
But whether the measurable sets are all subsets of R is an interesting ques- 
tion to which we return shortly. 


The distinctive property of Borel-Lebesgue measure is countable addi- 
tivity: if So,S 1,S2,... are disjoint measurable sets, then 


H(So US; US2U-+-) =M(So) tu (S1) +U(S2) + °°. 


This follows easily from Borel’s definition of measure for countable dis- 
joint unions, because any countable union can be reassembled as a count- 
able disjoint union. 
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Lebesgue showed that countable additivity gives a concept of integral 
that is better behaved with respect to limits than the Riemann integral. For 
example, one has the monotone convergence property: if fo, fi, fo,... is 
an increasing sequence of positive integrable functions, and f, — f as 
n — oo, then f ir dx > f f dx for the Lebesgue integral, whereas this is 
not generally true for the Riemann integral (see Exercise 17.3.1). 

It could be said that set theory paved the way for measure theory by 
showing the uncountability of R, thus enabling countable subsets of R to 
be regarded as “small.” On the other hand, measure theory itself shows 
the uncountability of R (by Harnack’s result), and in fact measure theory’s 
assessment of the smallness of countable sets greatly influenced the later 
development of set theory. 

“Measure theoretically desirable” axioms, such as the measurability 
of all subsets of R, turned out to conflict with “set theoretically desirable” 
axioms such as the continuum hypothesis, and efforts to resolve the con- 
flict brought to light more fundamental questions about sets. These ques- 
tions do not reduce to clear-cut alternatives—the way geometric questions 
reduce to alternative parallel axioms, for example—but they do seem to 
gravitate toward the choice and large cardinal axioms, discussed in the 
next section. 


EXERCISES 


17.3.1 Show that a function jf, that is zero at all but n points has Riemann inte- 
gral zero over any interval and that the non-Riemann integrable function of 
Dirichlet is a limit as n — oo of such functions f,. 


The complexity of Borel sets may be roughly measured by the number of count- 
able unions and complements needed to define them. Here are a few of the simpler 
ones. 


17.3.2 Show that a single point is the complement of a countable union of intervals 
and hence that any countable set is a Borel set. 


17.3.3, Deduce that the set of irrational numbers is a Borel set. 


17.3.4 What is the measure of the set of irrationals between 0 and 1? 


17.4 Axiom of Choice and Large Cardinals 


In its usual formulation, the axiom of choice states that any set S (of 
nonempty sets) has a choice function f such that f(x) € x foreach x € S. 
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(Thus f “chooses” an element from each set x in S$.) The axiom seems 
so plausible that early set theorists used it almost unconsciously, and it 
first attracted attention in Zermelo’s (1904) proof that any set S could be 
well ordered (that is, put in one-to-one correspondence with an ordinal). 
This looked like progress toward the continuum hypothesis. But Zermelo’s 
proof gave no more than the existence of a well-ordering of S, given a 
choice function for the set of subsets of S. There was still no sign of an 
explicit well-ordering of R. And of course if one doubted the existence 
of a well-ordering of R, this threw doubt on the axiom of choice. Further 
doubts were raised when the axiom of choice was found to have incredible 
consequences in measure theory. 

The first of these, discovered by Vitali (1905), was that the circle can 
be decomposed into countably many disjoint congruent sets. Since con- 
gruent sets have the same Lebesgue measure, it easily follows that the 
sets in question are not Lebesgue measurable (by countable additivity; see 
Exercises 17.4.2—17.4.4). 

Even more paradoxical decompositions were given by Hausdorff (1914) 
(for the sphere) and Banach and Tarski (1924) (for the ball). The Banach— 
Tarski theorem states that the unit ball can be decomposed into finitely 
many sets that, when rigidly moved in space, form two unit balls! This 
shows that not all subsets of the ball are measurable, even if one asks only 
for finite, rather than countable, additivity. For an excellent discussion of 
the paradoxical decompositions and their connections with other parts of 
mathematics, see Wagon (1985). 

The measure-theoretic consequences of the paradoxical decomposi- 
tions follow from the geometrically natural assumption that congruent sets 
have the same measure. If one drops this assumption and asks only for 
countable additivity and nontriviality (that is, not all subsets have mea- 
sure zero), then the conflict with the axiom of choice seems to disappear. 
No contradiction has yet been derived from these assumptions, but Ulam 
(1930) showed that any set possessing such a measure must be extraor- 
dinarily large—as large, in fact, as a model of set theory itself, and in 
particular larger than the cardinals &,,&o,...,8.,... . Thus if R has a 
nontrivial countably additive measure, then R must be far larger than \,, 
and we still have a conflict with the continuum hypothesis. (For more on 
the “largeness” of models, see Section 17.8.) 

A more desirable axiom than mere measurability would be Lebesgue 
measurability of all subsets of R. This conflicts with the axiom of choice, 
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by Vitali’s theorem, but it was nevertheless shown to be consistent with the 
usual axioms of set theory by Solovay (1970), assuming the existence of a 
large cardinal. Shelah (1984) showed that the large cardinal assumption is 
necessary. 

Thus measurability of all subsets of R is intimately connected with 
the existence of sets large enough to model the whole of set theory. This 
mind-boggling concept seems to be the answer to many fundamental ques- 
tions. We will find ourselves drawn to it again in the next sections when 
we explore the influence of set theory on logic. Meanwhile, for a longer 
introduction to set theory, its history, and interactions with analysis, see 
Stillwell (2013). For recent developments in the theory of large cardinals, 
which some believe will throw new light on the continuum hypothesis, see 
Kanamori (1994) and Woodin (1999). 


EXERCISES 


The axiom of choice turns up even in elementary analysis, when one attempts 
to formalize the idea of a continuous function. A natural definition in terms of 
infinite sequences is equivalent to the standard ¢-6 definition only if we assume 
the axiom of choice. 

Call f sequentially continuous at a if, for any sequence {a,} such that a, — a, 
we have f (a,) > f(a). 


17.4.1 Show, assuming the axiom of choice, that if f is not continuous at a then f 
is not sequentially continuous at a. (It is a consequence of Cohen (1966), 
p. 138, that this result cannot be proved without the axiom of choice. 
It turns on the fact that countably many choices are required to prove 
that an infinite set contains a countable subset. The next exercise involves 
uncountably many choices.) 


Vitali’s decomposition of the circle is created as follows. For each 6 between 
0 and 27 let S(@) be the set of points on the unit circle whose angle differs from 
6 by a rational multiple of 27. Thus $(@) = S(@) if @- ¢ = 27 x arational, and 
S(9) 1S (¢) = @ otherwise. 


17.4.2 Let S be a set (existing by virtue of the axiom of choice) that contains 
exactly one element from each distinct S(@) and let 


S +2ar={0+2ar:0¢S)} foreach rational r. 


(Thus S$ + 2zr is S rotated through the rational multiple 27r of 27.) Show 
that any two of the sets S + 27r are either identical or disjoint. 


17.4.3 Show that the circle is a countable union of sets S + 2zr. 


17.4.4 Show that both assumptions p(S) = 0 and p(S) > 0 lead to contradictions, 
and hence conclude that $ is nonmeasurable. 
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17.5 The Diagonal Argument 


The uncountability of R was shown again in a strikingly simple way by 
Cantor (1891). His argument applies most directly to the set 2" of all sub- 
sets of N, but there are variants that work similarly on the set N™ of integer 
functions and on R (which can be identified with a set of integer functions 
in various ways). To show that there are uncountably many subsets of N 
one shows that any countable collection S9,51,S2,... of sets S$, € N is 
incomplete, by constructing a new set S, different from each S,. S is the 
diagonal set {n: n ¢ S,,}, which obviously differs from S',, with respect to 
the number n. Q.E.D. 

The “diagonal” nature of S can be seen by visualizing a table of 0’s 
and 1’s in which 
0 ifmé€éS,, 

1 ifmeS,. 

In other words, the nth row consists of the values of the characteristic func- 
tion of S,,. The characteristic function of S is simply the diagonal of the 
table, with all values reversed. A sequence Xo, x1, X2,... of real numbers 
can be diagonalized similarly by forming the table whose nth row consists 
of the decimal digits of x,. A suitable way to “reverse” the digits on the 
diagonal is to change any | to a 2 and any other digit to a 1. (The resulting 
sequence of 1’s and 2’s, after a decimal point, then defines a real number 
x whose decimal expansion is unique. Hence x is not just different from 
each x, in its decimal expansion but is definitely a different number.) 

More generally, for any table of rows of integers, that is, any sequence 
of integer functions f,, one can construct an integer function f unequal to 
each f,, by changing the values along the diagonal of the table. The diag- 
onal argument was in fact first given in this context, by du Bois-Reymond 
(1875), in order to construct an f with a greater rate of growth than all 
functions in a sequence fo, fi, fo, ... (Exercise 17.5.1). With hindsight, one 
can even see a diagonal construction in Cantor’s first (1874) argument for 
the uncountability of R (Exercise 17.5.2). 

The diagonal argument is important in set theory because it readily 
generalizes to show that every set has more subsets than elements (Exer- 
cise 17.5.3), and hence that there is no largest set. What was not noticed 
at first is that the diagonal argument also has consequences at a more con- 
crete level. This is because the diagonal of a table is computable if the table 
as a whole is computable. Hence the argument does not merely show how 


mth entry in nth row = { 
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to add a new function f to a list fo, fi, fo, ...—it shows how to add a new 
computable function to a computable list. In other words, it is impossible 
to compute a list of all computable functions. And of course the same goes 
for lists of computable real numbers. This remarkable result went unno- 
ticed in the early days of the diagonal argument because computability 
was not then regarded as an interesting concept, or indeed as a mathemat- 
ical concept at all. The controversies over the axiom of choice, however, 
helped to sharpen awareness of the difference between constructive and 
nonconstructive functions. In the 1920s logicians began to investigate the 
concept of computability more seriously, and by a “kind of miracle,” as 
Godel (1946) later expressed it, computability turned out to be a mathe- 
matically precise notion. 


EXERCISES 


The diagonal construction is quite a natural way to construct a function or 
real number “larger” than the members of a given countable set. 


17.5.1 Given integer functions fo, fi, /2,..., define an integer function f such that 
f(m)/fr(m) > co as m > 00, for each n. Hint: Arrange that f(m) > nf,(m) 
for all m > n. 


17.5.2 Show that if ag < a, < a) < --- is a bounded sequence of real numbers, 
then a = least upper bound of {do, a), a2,...} is a diagonal number of the 
sequence in the following sense. There are integers ky < kj < ky <--> 
such that the decimal digits of a exceed those of a,, after the k,,th place. 


The last exercise applies the diagonal construction to any set J, to show that J 
has more subsets than members, so there is no largest set. 


17.5.3 Let I be any set, and let {S;} be a collection of subsets of J in one-to-one 
correspondence with the elements i of 7. Show that the natural diagonal set 
S of this collection is a subset of J unequal to each Sj. 


17.6 Computability 


The notion of computability was first formalized by Turing (1936) and 
Post (1936), who arrived independently at a definition of computing machine, 
now called a Turing machine. A Turing machine M is given by two finite 
sets, {Go,91,---»9m} of internal states and {50, 51,..., Sn} of symbols, and 
a transition function T that formalizes the behavior of M for pairs (qj, sj). 
The machine M is visualized as having an infinite tape, divided into 
squares, each of which can carry one of the symbols s;. (For most pur- 
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poses, M is assumed to start on a tape with all but finitely many squares 
blank: so is taken to denote the blank symbol.) Depending on its internal 
state g;, M will make a transition: changing s; to s;, then moving one 
square right or left and going into a new state q;. Thus the transition func- 
tion is given by finitely many equations 


T (qi Sj) = (m, Sk qv 


where m = +1 indicates a move to right or left. 

To use M to compute a function f : N — N, we need to adopt some 
convention for inputs (arguments of f) and outputs (values of f). The sim- 
plest is shown in Figure 17.1. M starts in state go on the leftmost 1 of a 
block of n 1s, on an otherwise blank tape, and halts on the leftmost 1 of a 
block of f(n) 1s, on an otherwise blank tape. M halts by virtue of entering 
a halting state, that is, a state g, for which M has no transition from the 
pair (gp, 1). A computable function f is one that can be represented in this 
way by a Turing machine M. 


$ f(r) > 


r 


qh 


Figure 17.1: Computing a function by Turing machine 


It follows that there are only countably many computable functions / : 
N > N, since there are only countably many Turing machines. In fact, we 
can compute a list of all Turing machines by first listing the finitely many 
machines with one transition, then those with two transitions, and so forth. 
This may seem to contradict the discovery from the previous section that a 
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list of all computable functions cannot be computed, but, as Turing (1936) 
realized, it does not. The catch is that not all machines define functions, 
and it is impossible to pick out all of those that do. Of course, it is possible 
to rule out any machine that halts in a situation unlike that in Figure 17.1; 
the difficulty is in knowing whether halting is going to occur at all. It is 
precisely this difficulty that prevents computation of the diagonal function. 

If it could be decided, for each machine WM and each input, whether M@ 
eventually halts, then we could find the first machine to halt on input 1, 
the next after that to halt on input 2, the next after that to halt on input 3, 
and so on. By changing the corresponding outputs according to some rule 
(say, adding 1 if the output is a number, and taking the value | otherwise), 
we could compute a function different from each computable function. 

This contradiction shows that the problem of deciding, given a machine 
and an input, whether halting eventually occurs, is unsolvable. This prob- 
lem is called the halting problem and its unsolvability means that no Tur- 
ing machine can solve it. That is, if the questions “Does M on input n 
eventually halt?” are written in some fixed finite alphabet, then there is no 
machine that, given these questions as inputs, will give their answers as 
outputs. The point is that, as far as we know, all possible rules or algo- 
rithms for answering infinite sets of questions can be realized by Turing 
machines. This is the “kind of miracle” referred to by Gédel (1946). 

Now that computers are everywhere, it is taken for granted that the 
word “computability” has a precise, absolute meaning—synonymous with 
Turing machine computability. It is even a familiar fact that all computa- 
tions can be done on a single, sufficiently powerful machine; this corre- 
sponds to the discovery of Turing (1936) of a universal Turing machine. 
However, these claims were surprising in the 1930s, particularly to Gédel, 
who had shown (1931) that the related notion of “provability” is not abso- 
lute. This will be discussed further in the next section. Briefly, the reason 
for the difference is that new computable functions cannot be created by 
diagonalization, whereas new theorems can. 


Unsolvable Problems 


The halting problem was of no obvious mathematical significance in 1936, 
but it seemed no more difficult than other unsolved algorithmic problems 
in mathematics. Thus for the first time it was reasonable to suspect that 
some ordinary mathematical problems were unsolvable. Moreover, if it 
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could be shown that a solution of a particular problem P implied a solu- 
tion of the halting problem, then the unsolvability of P would be rigor- 
ously established. This method was used to demonstrate the unsolvability 
of some problems in formal logic by Turing (1936) and Church (1936). 
Church (1938) also put forward a strong candidate for unsolvability in 
ordinary mathematics: the word problem for groups. 

This is the problem of deciding, given a finite set of defining relations 
for a group G (Section 14.7) and a word w, whether w = 1 in G. There is 
more than a superficial analogy between the word problem and the halting 
problem. The group G corresponds to a machine M, words in G correspond 
to expressions on M’s tape, and w = 1 corresponds to halting. The defin- 
ing relations of G roughly correspond to the transition function of M, but 
unfortunately there is no machine equivalent of the cancellation of inverses 
in G. This creates fierce technical difficulties, but they were overcome by 
Novikov (1955). He succeeded in establishing the validity of the analogy 
and hence the unsolvability of the word problem. This led to unsolvability 
results for a host of significant mathematical problems, among them the 
homeomorphism problem mentioned in Section 15.5 (The reference given 
there, Stillwell (1993), also includes a proof of the unsolvability of the 
word problem.) 


EXERCISES 


Turing (1936) actually discovered the unsolvability of the halting problem 
by considering computable real numbers and applying the diagonal argument to 
them. The argument is similar to the one above using computable functions, but a 
little messier. Define a real number x to be computable if there is a Turing machine 
M that represents x in the following manner. 


e Starting on a blank tape, M prints the decimal digits of x on successive 
squares of tape, eventually filling each square to the right of the square 
initially scanned (if necessary, printing all Os beyond a certain point). 


e The squares to the left may be used, and reused, for preliminary compu- 
tation, but squares to the right, once written, may not be rewritten. 


17.6.1 Show that there is no algorithm for recognizing the Turing machines that 
define real numbers in this way, since such an algorithm would give a way 
to compute a number different from every computable number. 


17.6.2, Explain informally how each Turing machine M may be converted to a 
machine M’ such that M defines a computable number if and only if M’ 
does not halt. 


17.6.3 Hence prove that no Turing machine can solve the halting problem. 
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17.7 Logic and Gédel’s Theorem 


Since the time of Leibniz, and perhaps earlier, attempts have been made 
to mechanize mathematical reasoning. There was little success until the 
late 19th century, when reduction of the many concepts of number, space, 
function, and the like, to the single concept of set simplified the axioms 
that seemed to be necessary for mathematics. At about the same time, 
investigation of the principles of logic by Boole (1847), and particularly 
Frege (1879), led to a system of rules by which all logical consequences 
of a given set of axioms could be derived. These two lines of investigation 
together offered the possibility of a complete, rigorous, and, in principle, 
mechanical system for deriving all mathematics. 

The Principia Mathematica of Whitehead and Russell (1910) was a 
massive attempt to realize this possibility. Principia used axioms about 
sets, together with simple rules of inference, to derive a large part of ordi- 
nary mathematics in a completely formal language. When Whitehead and 
Russell began writing the Principia in 1900, they believed that they were 
about to reach the 19th-century goal of completeness and absolute rigor. 
They did not know that the rigor of their system—the ability to check 
proofs mechanically—was in fact incompatible with completeness. Gédel 
(1931) found true sentences expressible in the language of Principia that 
do not follow from its axioms. (Unless Principia is inconsistent, in which 
case all sentences follow and the system is useless.) 

Gédel’s theorem created a sensation when it first appeared. It shattered 
previous conceptions of mathematics and logic, and its proof was of a new 
and bewildering kind. Gédel exploited the mechanical nature of proof in 
Principia to define the relation “the nth sentence of Principia is provable” 
in the language of Principia itself. Using this, he was able to concoct a 
sentence that says, in effect, “This sentence is not provable.’ The Gédel 
sentence, if true, is therefore not provable. And if false, it is provable, and 
so Principia proves a false sentence. Either way, provability in Principia 
is not the same as truth. 

Gédel’s proof was very difficult for his contemporaries to understand. 
Along with the novelty of treating sentences and proofs as mathemati- 
cal objects was the near inconsistency of a sentence expressing its own 
unprovability (a sentence that says “This sentence is not true” is inconsis- 
tent). Post (1944) presented Gédel’s theorem less paradoxically, and tied 
it to computability theory, by using the classical diagonal argument. 
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Post’s Approach to Gédel’s Theorem 


The key to Post’s approach is the concept of a computably enumerable set 
(called recursively enumerable in Post’s time). A set W is computably enu- 
merable if a list of its members can be computed, say by a Turing machine 
that prints them on its tape. (Of course if W is infinite, the computation 
runs forever.) A typical computably enumerable set is the set of theorems 
of a formal system, such as Principia Mathematica. For such a system one 
can list all sentences, then all finite sequences of sentences, and then, by 
picking out those sequences that are proofs, make a list of all theorems— 
since a theorem is simply the last line of a proof. 

Post’s idea was to look at the theorems about computably enumerable 
sets proved in a given system & and to compute a “diagonal sentence” 
from them. Since computably enumerable sets are associated with Turing 
machines, it is possible to enumerate the computably enumerable subsets 
of N as Wo, Wi, Wo,... by letting W,, be the set of numbers output by 
the nth machine, under some reasonable convention. (Incidentally, there is 
no problem of picking out suitable machines, as there is for computable 
functions, since we do not mind if W,, is empty.) The diagonal set 


D={n:n¢ W,}, 


being unequal to each W,, is of course not computably enumerable, but 
the following set is: 


Pr(D) = {n : X proves “n ¢ W,,”}. 


This “provable part” of D is computably enumerable because we can list 
the theorems of & and select those of the form “n ¢ W,,.’ Assuming that 
x proves only correct sentences we have Pr(D) € D, but Pr(D) # D since 
Pr(D) is computably enumerable and D is not. This shows immediately 
that there is an no in D that is not in Pr(D), that is, an no ¢ W,,, for which 
“no ¢ W,,.” is not provable. 

Better still, a specific no with this property is the index of the com- 
putably enumerable set Pr(D). If W,,, = Pr(D), then ng € W,,, is equivalent 
to no € Pr(D), which means that “ng ¢ W,,” is provable. But then it is true 
that no ¢ W,,,, assuming that X proves only correct sentences, and we have 
a contradiction. Thus no ¢ W,,,. This in turn is equivalent to no ¢ Pr(D), 
which means “rg ¢ W,,,” is not provable. (Notice, incidentally, that the last 
part of this argument reveals “ng ¢ W,,,” to be a sentence that expresses its 
own unprovability.) 
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Post was aware of this approach to Gédel’s theorem in the 1920s, 
before Gédel’s own proof appeared. However, Post’s more general view of 
incompleteness as a property of arbitrary computably enumerable systems 
held him up until he was satisfied that computability was a mathematically 
definable concept. In December 1925 Post formulated a plan for proving 
Principia Mathematica incomplete but, as he later wrote, “The plan, how- 
ever, included prior calisthenics at other mathematical and logical work, 
and did not count on the appearance of a Gédel!” (Post (1941), p. 418). 


The Unprovability of Consistency 


Gédel’s theorem comes from reflecting on the nature of proofs. An even 
more devastating theorem, known as Gédel’s second theorem, comes from 
reflecting on the proof of Gédel’s theorem itself. The latter proof, unusual 
though it is, can be expressed in ordinary mathematical language. 

We described Post’s proof of Gédel’s theorem in an informal language 
of Turing machines. But with some effort it can be expressed in the system 
for number theory called Peano arithmetic (PA), mentioned in Section 3.3. 
Indeed, this arithmetization of syntax was one of Gédel’s greatest ideas. By 
doing his proof in PA, he exposed the incompleteness of classical math- 
ematics. Turing machines can be discussed in PA by encoding sequences 
of symbols on the tape as numerals, so that machine operations become 
operations on numbers. Under this encoding, “ng ¢ W,,.” and “X does not 
prove ‘no ¢ W,,,’” become sentences of PA. 

Here it is important to recall the assumption about & used to prove 
Gédel’s theorem: 2 proves only correct sentences. This assumption can- 
not be dropped (since a false sentence implies all sentences), but it can be 
weakened to the assumption that X does not prove the sentence “O = 1.” 
The latter assumption says that a certain number (the number of the sen- 
tence “O = 1”) is not in a certain computably enumerable set (the set of 
theorems of ), so it can be expressed as a sentence of PA, call it Con(%). 
In particular, PA expresses its own consistency by the sentence Con(PA). 
Gédel’s theorem for X = PA then becomes the following sentence of PA: 


Con(PA) = PA does not prove “ng ¢ Wr.” 


As we know, the sentence “ng ¢ W,,,” is equivalent to its own unprovabil- 
ity, so an equivalent of the last sentence is simply 


Con(PA) => no ¢ Wry- 
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Now Gdédel noticed that his proof could be carried out in PA. This hap- 
pened after some prompting from von Neumann (1930), who noticed the 
same thing. (The rather laborious verification was carried out by Hilbert 
and Bernays (1939)). Consequently, if Con(PA) can be proved in PA, then 
so can “no ¢ W,,,” by basic logic. But if PA is consistent, “no ¢ W,,” 
cannot be proved in it, by Gédel’s theorem, hence neither can Con(PA). 
(Gédel of course had a different unprovable sentence, but it was similarly 
implied by Con(PA), and equivalent to its own unprovability.) 

Thus the assertion Con(PA) that the axioms of PA are consistent is 
in some way stronger than the axioms themselves. Similarly, if & is any 
system that includes PA (such as Principia Mathematica and other systems 
of set theory), then Con(X) cannot be proved in &, if X is consistent. This 
is Gédel’s second theorem. 


EXERCISES 


It is instructive to spell out why the sentence “ng ¢ W,,,” expresses its own 
unprovability, if this is not already obvious. 


17.7.1 Fill in the gap so as to establish a chain of equivalences: 


no € W,, & «+: & XZ does not prove “no ¢ W,,,”. 


A remarkable new form of Gédel’s theorem was discovered by Chaitin (1970). 
Like Gédel’s own version, it is most easily explained in terms of computation. 
Let us call a finite sequence o of Os and 1s computationally random if it cannot 
be produced (from a blank tape) by a Turing machine whose description is shorter 
than o. To compare lengths fairly we assume that Turing machines are themselves 
written as sequences of Os and Is. (This makes the definition of “computationally 
random” dependent on the way we encode Turing machines, but never mind—the 
proof of Chaitin’s theorem assumes only that the method of encoding is com- 
putable.) 


17.7.2 Give an informal argument to explain why the sequence of 10!0° 


tive Os is not computationally random. 


consecu- 


17.7.3. Show that at most 2” — 1 Turing machines have descriptions of length less 
than n. 


17.7.4 Deduce from Exercise 17.7.3 that there are infinitely many computationally 
random sequences. 


Despite the prevalence of computationally random sequences, they are very hard 
to find. Chaitin’s incompleteness theorem states: any sound formal system proves 
only finitely many theorems of the form “co is computationally random.” 
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To prove Chaitin’s theorem suppose, on the contrary, that there is a formal 
system, and hence a Turing machine M, that generates infinitely many theorems 
of the form “o is computationally random,” and no false statements of this form. 
Suppose, for example, that M has length 10°. 


17.7.5 Explain informally how to convert M to a machine M’ that finds the first 
theorem of the form “o is computationally random” output by M, where o 
has at least 10! digits. 


17.7.6 Also explain informally why the length of M’ is less than 10!, 


17.7.7 Deduce from Exercise 17.7.6 that we have a contradiction; hence M does 
not exist. 


17.8 Provability and Truth 


The previous section stressed that Gédel’s theorem is a statement of alter- 
natives: a formal system % either fails to prove a true sentence or else 
proves a false one. Gédel’s second theorem identifies a sentence, Con(X), 
which is either true and unprovable or false and provable, but does not 
say which alternative holds for a particular 2, such as PA or Principia. 
How could it, without violating Gédel’s theorem itself? Unless & actually 
is inconsistent, there can be no proof in X that Con(X) is true! 

Nevertheless, Gédel’s theorem tells us that we have nothing to lose 
by adding Con(%) to the system &. If 2 is inconsistent, then it is already 
worthless, and we are no worse off for having added Con(2). And if & is 
consistent, we actually gain, because Con(X) is a new mathematical truth 
not provable from % alone. In this way, Gédel’s theorem allows us to tran- 
scend any given formal system. Knowing that Con(2) is beyond the scope 
of x (if X is consistent) is of practical value to mathematicians, for it means 
there is no point trying to prove any sentence that implies Con(X). If one 
wants to use such a sentence, it should be taken as a new axiom. 

Sentences of mathematical interest actually arise in this way, most sim- 
ply in set theory, where consistency is implied by the existence of a “large 
set.” The usual axioms of set theory (called the Zermelo—Fraenkel, or ZF, 
axioms) say roughly that 


(i) Nisa set. 


(ii) Further sets result from certain operations, the most important of 
which are power (taking all subsets of a set) and replacement (taking 
the range of a function whose domain is a set). 
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Because of this, the axioms of ZF can be modeled by any set that contains 
N and is closed under power and replacement. Such a set has to be very 
large—larger than any set whose existence can be proved in ZF—but if 
it exists then ZF must be consistent, since two contradictory sentences 
cannot be true of an actually existing object. Thus the existence of a set 
that is large in the above sense implies Con(ZF). 

If ZF is consistent, then ZF + Con(ZF) is also consistent, but an even 
larger set is required to satisfy the enlarged axiom system. These large-set 
existence axioms are called axioms of infinity. Since they imply Con(ZF), 
they cannot be proved in ZF. In particular, one cannot prove the exis- 
tence of a nontrivial measure on all subsets of R since, as mentioned in 
Section 17.3, this implies the existence of a large set. G6del (1946) made 
the interesting speculation that any true but unprovable proposition is a 
consequence of some axiom of infinity. 

More recently, some largeness properties in number theory have been 
found to imply Con(PA). The first of these was found by Paris and Har- 
rington (1977), using a modification of a combinatorial theorem of Ram- 
sey (1929). Paris and Harrington found a sentence o that says that for each 
n € N there is an m such that sets of size > m have a certain combinatorial 
property C(n). They showed that o follows from a well-known theorem 
on infinite sets, called Ramsey’s theorem, but that the function 


f(n) = least m such that sets of size m have property C(n) 


grows faster than any computable function whose existence can be proved 
in PA. Thus o in some sense asserts the existence of a large function. The 
property C(n) is such that one can decide whether a finite set has it or not; 
hence o implies (very simply, and certainly in PA) that f is computable. 
This shows immediately that o cannot be proved in PA, but Paris and Har- 
rington in fact proved the stronger result that o implies Con(PA). For an 
excellent introduction to Ramsey theory and the Paris-Harrington theorem, 
see Katz and Reimann (2018). 

Gédel’s theorem shows that something is missing in the formal view 
of mathematics, and the axioms of infinity show that the missing elements 
may be mathematically interesting and important. Despite this, it is com- 
monly thought that mathematics consists in the formal deduction of theo- 
rems from fixed axioms. As early as 1941 Post protested against this view: 


It is to the writer’s continuing amazement that ten years after 
Gédel’s remarkable achievement current views on the nature 


17.8 Provability and Truth 345 


of mathematics are thereby affected only to the point of seeing 
the need of many formal systems, instead of a universal one. 
Rather has it seemed to us to be inevitable that these develop- 
ments will result in a reversal of the entire axiomatic trend of 
the late 19th and early 20th centuries, with a return to meaning 
and truth. 


Post (1941), p. 345 


Things have indeed not turned out as Post expected—the “axiomatic 
trend” rolls on—but there has been a “reversal” of sorts. The last 40 years 
have seen the development of reverse mathematics, the aim of which is to 
find the “right” axioms to prove given theorems, in the sense given by the 
seminal work of Friedman (1975): 


When a theorem is proved from the right axioms, the axioms 
can be proved from the theorem. 


For example, Euclid’s parallel axiom is the right axiom to prove the theo- 
rem of Pythagoras, the theorem that the angle sum of a triangle is 2, and 
many other geometric theorems, because it can be shown that these the- 
orems imply the parallel axiom. More precisely, the parallel axiom can 
be proved from them assuming only the other axioms of Euclid. These 
implications were known long ago, but they became interesting only when 
Beltrami (1868a) showed that the parallel axiom is not provable from 
Euclid’s other axioms. Thus a reverse mathematics of Euclidean geometry 
becomes possible with the discovery that the parallel axiom is independent 
of Euclid’s other axioms. 

Modern reverse mathematics begins with the discoveries of Post, Tur- 
ing, and Gédel that certain real numbers are not computable. From this it 
follows, using the relations they found between logic, computation, and 
arithmetic, that certain axioms about infinite sets of natural numbers are 
independent of basic axioms about the natural numbers. (The basics are, 
roughly, PA plus an axiom stating the existence of computable sets). It 
turns out, surprisingly, that a small number of these seemingly obscure 
axioms about infinity are the “right” axioms to prove standard theorems 
about real numbers and continuous functions. 

Reverse mathematics today covers not only analysis, but also parts of 
topology, combinatorics, and algebra. For an introduction, see Stillwell 
(2018), and for a more encyclopedic treatment, see Simpson (2009). 
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EXERCISES 


An argument for the unprovability of “large” sets that does not assume the 
unprovability of consistency was discovered by Zermelo in 1928 (Zermelo’s 
announcement is mentioned in Baer (1928)). Since this was before Gédel’s own 
work, it seems fair to call this Zermelo’s incompleteness theorem. It states that, if 
“large” sets exist, then this fact is not provable in ZF. 

To pave the way for Zermelo’s argument, we need to explain how ordinals 
measure the “complexity level’—called the rank—of sets. The simplest set is the 
empty set 0, which is assigned rank 0. For each ordinal a, the sets of rank < a+ 1 
are those of rank < a, together with all subsets of the set of sets of rank < a. 


17.8.1 Show that 1 = {0} has rank 1, and more generally that n + 1 = {0,1,...,n} 
has rank n + 1. 


If A is an ordinal not of the form a@ + 1, the sets of rank < A are those of rank 
a < A, together with all subsets of the set of sets of rank < A. 


17.8.2 Show that the ordinal w = {0,1,2,...} has rank w. 
17.8.3 More generally, show that any ordinal @ has rank a. 


It is essentially an axiom of ZF (the axiom of foundation) that every set has a rank. 

An ordinal 2 is called inaccessible if the sets of rank < A are closed under the 
power and replacement operations. Thus, if an inaccessible A exists, the sets of 
rank < A form a model of ZF. Also, if inaccessible ordinals exist, there is a least 
inaccessible, ju. 


17.8.4 Show that the sets of rank < yz are a model of ZF plus the sentence “there 
is no inaccessible ordinal.” 


17.8.5 Deduce from Exercise 17.8.4 that, if inaccessible ordinals exist, this fact is 
not provable in ZF. 


For a wide-ranging introduction to the interplay between logic and infinity, 
see Stillwell (2010b). 
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Alberti, 100 
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algebra, 64 
abstract, 257 
and algebraic geometry, 65 
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linear, 14, 298 
origin of word, 64 
algebraic 
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definition, 150 
field, 318 
fractional power series, 150 
power series, 146 
geometry, 14, 29 
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integer, 306 
number, 304 
number fields, 303 
number theory, 297 
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numbers 
form countable set, 325, 326 
algebraic geometry, 14, 86 
and algebra, 65 
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discovery, 87 
algorithm 
Euclidean, 39 
origin of word, 64 
theory, 295, 323 
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analytic geometry, 87, 97 
and foundations, 97 
anamorphosis, 103 
angle division, 75 
and complex numbers, 185 
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Leibniz formula, 76 
Newton formula, 76 
Viéte formulas, 76 
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and continued fractions, 45 
antiderivative, 138 
Apéry, 154 
Apollonius 
and conic sections, 86 
epicycles, 31 
four-line problem, 87 
theory of irrationals, 71 
arc length, 88, 226 
and elliptic integrals, 170 
integral, 144, 228 
of catenary, 230 
of circle, 144 
of cycloid, 227 
of lemniscate, 171 
of logarithmic spiral, 227 
of semicubical parabola, 227 
Archimedes 
and geometric series, 140 
and Pell’s equation, 45 
and volume of sphere, 127 
area of parabola, 51, 60, 123 
cattle problem, 45 
Method, 53, 126 
results on the sphere, 60 
spiral, 128 
area 
of circle, 57 
of hyperbola, 62 
of hyperbolic circle, 240 
of parabola, 60 
of polygons, 58 
of sphere, 60 
of triangle, 56 
proportional to square, 57 
Argand, 190 
Aristotle 
Prior Analytics, 13 
version of Zeno, 52 


arithmetic-geometric mean, 178 


and Gauss, 178 

and Lagrange, 178 
arithmetization 

of geometry, 96, 316 
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of syntax, 341 
associative law, 258, 308 
asymptotic lines, 238 
axiom of choice, 331 
and continuous functions, 333 
implies well-ordering, 332 
in measure theory, 332 
statement, 331 
axiom of foundation, 346 
axioms, 17, 18 
choice, 331 
for groups, 258 
in Euclid’s Elements, 18, 225 
large cardinal, 331 
of infinity, 344 
of set theory, 328, 331, 343 
parallel, 17, 225, 237 


B 
Bachet 
Diophantus, 158 
edition of Diophantus, 157 
stated four-square theorem, 37 
Banach, 332 
Banach—Tarski theorem, 332 
Beeckman, 94 
Beltrami, 234 
conformal models, 248 
half-space model, 251 
hyperbolic plane, 241 
Berkeley, 53 
Bernays, 342 
Bernoulli 
definition of geodesic, 235 
Jakob 
and elliptic integrals, 171 
and logarithmic spiral, 229, 230 
lemniscate, 30, 171 
Johann 
and ¥ 1/n, 152 
and complex logarithms, 206 
and complex numbers, 186 
and tractrix, 230 
Bessel, 178, 212 
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Bézout’s theorem, 85, 94, 96, 99 
and fundamental theorem 
of algebra, 193, 195 
homogeneous formulation, 120 
stated by Newton, 95 
binomial 
coefficient, 79, 149 
as number of combinations, 80 
divisibility property, 83 
sum property, 81 
series, 149 
theorem, 123, 132, 146 
and Fermat’s little theorem, 83 
and interpolation, 147 
Bolyai hyperbolic geometry, 243 
Bolzano, 191 
intermediate value 
theorem, 191, 324 
Bombelli, 157, 183 
Bonnet, 236 
Boole, 339 
Borel, 330 
Bosse, 105 
Brahana, 289 
Brahmagupta 
and Pell’s equation, 45 
quadratic formula, 68 
branch point, 200 
Briggs, 148 
Bring, 78 
Brouncker, 132 
and Pell’s equation, 43 
continued fraction, 132 
Brunelleschi, 100 


Cc 

calculus, 87, 123, 124 
and differential geometry, 226 
and interpolation, 148 
and mechanics, 124 
and method of exhaustion, 124 
and tangents, 124 
fundamental theorem, 137 
of Leibniz, 136 
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of Newton, 124, 133 
priority dispute, 136 
Cantor 
continuum hypothesis, 328 
defined Xo, 8&1, No, ..., 328 
discovered uncountability, 325 
first uncountability proof, 325 
limit point operation, 326 
ordinal generating operations, 327 
transcendental numbers, 326 
Cardano, 73 
and complex numbers, 76, 183 
solution of cubic, 74 
cardinality, 327 
cardinals, 328 
No, 8, N8o,..., 328 
large, 331, 332 
uncountable, 328 
cardioid, 33 
Cassini, 30 
Cassini oval, 30 
catenary 
and tractrix, 228, 230 
arc length, 230 
cattle problem, 45 
Cauchy 
and permutation groups, 267 
integral theorem, 205, 212 
notation for identity, 267 
notation for inverse, 267 
polygonal number theorem, 37 
Cauchy—Riemann equations, 205, 209 
and hydrodynamics, 208 
Cavalieri 
and volume of sphere, 127 
integration formula, 126 
method of indivisibles, 126 
Cayley 
abstract group concept, 267 
and projective geometry, 273 
permutation group theorem, 268 
projective model, 247 
Chaitin incompleteness theorem, 342 
choice function, 331 
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chord-tangent construction, 7, 
47, 157, 165 
Church, 338 
circle division, 25, 179 
circular functions 
and complex logarithms, 206 
and complex numbers, 186 
and cubic equations, 75 
and elliptic functions, 168 
and the circle, 168 
partial fraction series, 216 
circumradius, 22 
cissoid, 29 
cusp, 89 
classification 
of surfaces, 287 
Clebsch, 165, 170 
addition of points, 220 
Cohen, 328 
combinatorics, 124 
common notions, 19 
and equivalence relations, 20, 
210 
commutative law, 308 
complex curves 
and Newton—Puiseux 
theory, 203 
as Riemann surfaces, 198 
topology, 201 
complex functions, 206 
and differentiability, 209 
and integration, 212 
as power series, 209 
real and imaginary parts, 208 
complex numbers, 181 
and angle division, 185 
and circular functions, 186 
and cubic equations, 76, 183 
and elliptic functions, 168, 177 
conjugate, 190 
geometric properties, 190 
geometric representation 
by Argand, 191 
by Cotes, 187 
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by Wessel, 191 
composition of functions, 258 
computability, 323, 335 
and diagonal argument, 334 
by Turing machine, 335 
of functions, 336 
of real numbers, 338 
computably enumerable set, 340 
computation, 323 
and randomness, 342 
conformal mapping, 205 
and mapmaking, 210 
conformal model, 248 
as part of C, 254 
disk, 249 
half-plane, 249 
distance, 250 
hemisphere, 248 
in half-space, 251 
congruence 
and groups of motions, 272 
modulo n, 259 
modulo an irreducible polynomial, 
312 
conic sections, 17, 26 
attributed to Menaechmus, 27 
in Descartes, 26 
meaning of names, 26 
projective view, 99, 110 
second-degree equations, 85, 87 
conjugates, 190 
constructible 
number, 26, 315 
has degree 2*, 316 
points, 70 
polygons, 25 
construction 
of equations, 94 
ruler and compass, 17, 23 
of double circle arc, 173 
of double lemniscate arc, 174 
continued fraction 
and Pell’s equation, 45 
definition, 46 
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for z, 132 
periodic, 47 
continuity, 190 
and axiom of choice, 333 
continuous 
functions, 191 
extreme value theorem, 191 
intermediate value theorem, 
191 
magnitude, 55 
Dedekind definition, 55 
process, 3 
continuum hypothesis, 328, 333 
consistency, 328 
independence, 328 
coordinates, 1, 6, 13, 85 
in Hipparchus, 86 
in Oresme, 86 
in vector space, 300 
coset, 261 
multiplication, 262 
Cotes, 187 
and complex logarithms, 207 
and complex numbers, 187 
Harmonia mensurarum, 207 
theorem on n-gon, 187 
countability, 325 
countable additivity, 330 
counting board, 66 
covering, 200 
of orientable surface, 291 
of projective plane, 290 
of pseudosphere, 290 
of torus, 290 
projection map, 202 
sheets of, 200 
and integration, 215 
universal, 290 
Cramer 
and Bézout’s theorem, 96 
and permutations, 266 
Cramer’s rule, 66, 298 
cross-ratio, 99 
and hyperbolic distance, 273 


as a group invariant, 273 

in Desargues, 105 

in Pappus, 106 

invariance, 115, 117 

Mobius invariance proof, 107 

on finite projective line, 280 
cryptography 

and Fermat’s little theorem, 82 
cube, 21 
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duplication of see duplication of the 


cube, 24 

rotation group, 269 

cubic curves, 29, 85 
as tori, 203 
five types, 92, 99 
geometric features, 85, 89 
isomorphic to C/A, 218 
Newton classification, 91 
of genus 0, 163 
parameterization, 165, 169 
projective classification, 221 
projective view, 110 

cubic equations, 63, 73 
and circular functions, 75 
and complex functions, 206 
and complex numbers, 76, 183 
and trisection, 75 
have real roots, 183 
in Cardano, 74 
in Viéte, 75 
solution, 73 

curvature, 124 
center of, 230 
constant 

surface of, 225, 245 

Gaussian, 225, 233 
geodesic, 236 
intrinsic, 232 
negative 


and non-Euclidean geometry, 225, 


234 
surface of, 225, 233 
Newton formula, 230 
of plane curves, 229 
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of surfaces, 232 

principal, 232 

radius of, 230 
curve 

algebraic, 32, 89, 193, 226 

behavior at infinity, 108 

cubic, 29, 91 

degree, 89 

equidistant, 247 

in conformal model, 249 

geometric, 89 

mechanical, 87, 89, 226, 228 

on projective plane, 113 

projective, 108 

transcendental, 89, 226 

and differential geometry, 226 

cusp, 85, 89 

of cissoid, 29, 89 

of semicubical parabola, 92 
cycloid, 32 

arc length, 227 

is own involute, 230 

pendulum, 230 


D 
d’ Alembert 
and complex functions, 208 
and conjugate solutions, 190 
fundamental theorem of 
algebra, 190 
lemma, 191 
on algebra in geometry, 91 
de Moivre 
formula, 76 
inversion formula, 135 
solution by radicals, 76 
Dedekind 
and algebraic number fields, 303 
cut, 55, 191, 324 
for irrational, 55 
for rational, 55 
definition of V2, 55 
definition of continuity, 55 
definition of ideal, 318 
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dimension theorem, 313 
rigor, 53 
rings, 322 
tribute to Kummer, 317 
degree 
of algebraic number, 305 
of curve, 17, 85, 89 
Dehn 
and hyperbolic geometry, 295 
combinatorial group theory, 278 
solved Hilbert’s third problem, 58 
Desargues, 103 
and cross-ratio, 105 
Brouillon projet, 105 
projective geometry, 105 
theorem, 99, 105 
and foundations, 106 
statement, 106 
Descartes, 85 
and algebraic geometry, 13, 87 
coordinate method, 19 
factor theorem, 79, 189 
folium, 89 
Géométrie, 87 
integration formula, 126 
notation for powers, 79 
polyhedron formula, 285 
descriptive geometry, 104 
determinant, 66, 120, 298 
theory, 299 
and algebraic integers, 306 
diagonal argument, 334 
and computability, 334 
and Godel’s theorem, 339 
and rate of growth, 334 
for real numbers, 334 
for sets, 334 
differentiability, 205 
differential equations 
for geodesics, 235 
differential geometry, 226 
and calculus, 226 
and curvature, 229 
and hyperbolic geometry, 244 
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differentiation, 124 
dimension of vector space, 300, 
314 
and basis, 301 
over Q, 304, 315 
Diocles, 29 
Diophantine 
equations, 7, 35 
cubic, 48 
linear, 42 
no algorithm, 7, 36 
quadratic, 43 
rational solutions, 7 
problems, 7 
Diophantus, 4, 35 
and Diophantine problems, 7 
and Pythagorean triples, 8 
chord and tangent methods, 65, 
157 
chord method, 47, 48 
on folium, 90 
method, 7 
and elliptic functions, 165 
and Fermat, 7 
and Newton, 7 
geometric interpretation, 48 
tangent method, 47, 48, 129 
and Viéte, 49 
Dirichlet 
function, 329, 331 
principle 
and Riemann mapping 
theorem, 211 
justified by Hilbert, 211 
discrete process, 3 
distance, 97 
and Pythagorean theorem, 13 
definition of, 97 
distributive law, 308 
divergence of harmonic series, 141 
divisibility 
and Pythagorean triples, 5 
in Euclid, 5 
division of stakes, 80 
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dodecahedron 
rotation group, 270 
double periodicity, 178 
and complex integration, 215 
and Riemann, 178, 215 
of Weierstrass g-function, 217 
double point, 89, 163 
double root, 164 
doubling the arc 
of circle, 173 
of lemniscate, 173, 174 
du Bois-Reymond, 334 
duplication of the cube, 17, 24 
by cissoid, 29 
by intersecting conics, 27 
by Menaechmus, 27 
Diirer, 100 
Dyck 
concept of group, 268 
groups and tessellations, 271 


E 
e is transcendental, 25 
Eisenstein 
and algebraic integers, 306 
series, 216 
elastica, 170 
elimination, 65, 95 
and linear algebra, 120 
and polynomial equations, 66 
Gaussian, 65 
ellipse, 26 
arc length, 157, 170 
as planetary orbit, 27 
versus Cassini oval, 30 
focus of, 28 
not an elliptic curve, 170 
string construction, 28 
elliptic 
curves, 36, 157, 167, 170, 218 
addition of points, 220 
and Fermat’s last theorem, 158, 
218 
isomorphic to C/A, 220 
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parameterized by 9g, yg’, 220 
functions, 38, 87, 138, 157, 
165, 170 
addition theorem, 167 
and complex numbers, 177, 
205 
and the torus, 203 
birth day, 175 
by inverting integrals, 177 
double periodicity, 178, 215 
series expansions, 216 
integrals, 157, 170 
addition theorem, 175 
not elementary, 171 
elliptic modular functions see 
modular functions, 78 
empty set, 327 
epicycles, 31 
equation 
cubic, 73 
solution, 73 
Diophantine, 35 
equation, 63 
linear, 63, 65 
modular, 280 
Pell’s, 35, 43 
polynomial, 64 
quadratic, 63 
Brahmagupta formula, 64 
in Babylon, 64 
in Euclid, 64 
quartic, 77 
quintic, 63, 78 
equivalence relation, 20 
defined by group, 275 


Euclid, 4 
Elements, 4, 17 
Book V, 51, 54 


common notions, 19, 275 
postulates, 18 
perfect number theorem, 38 
and geometric series, 61 
proofs of Pythagorean 
theorem, 11 
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Pythagorean triples formula, 4 
theory of divisibility, 5 

theory of irrationals, 71 

used induction, 41 

view of quadratic equations, 64 


Euclidean 


algorithm, 35, 39 
for Gaussian integers, 310 
for polynomials, 168, 309 
geometry, 97 
on horosphere, 244 
on torus, 292 
plane, 97 
rigid motions, 253 
tessellations, 252 


Eudoxus, 51 


definition of equality, 54 
method of exhaustion, 56 
theory of proportions, 54 


Euler 


addition theorems, 167, 175 
and Bézout’s theorem, 96 
and chord-tangent construction, 165 
and complex logarithms, 207 
and conformal mapping, 210 
characteristic, 283, 285 
and genus, 289 
Poincaré generalization, 286 
constant, 142 
and zeta function, 155 
continued fraction formula, 133 
cotangent series, 216 
formula for e, 207 
geodesic differential equation, 235 
pentagonal number theorem, 37 
perfect number theorem, 38 
polyhedron formula, 285 
Legendre proof, 285 
product formula, 153 
proof of Fermat’s little theorem, 83, 
257 
summed ¥) 1/n?, 151 
used algebraic integers, 308 
values of Z(s), 154 
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zeta function formula, 139, 153 
exhaustion see method of exhaustion, 
56 
exponential function, 135 
addition formula, 208 
complex, 205, 207 
periodicity, 205, 207 
extreme value theorem, 191, 192 


F 
factor theorem, 79, 152 
Fagnano, 165 
addition theorem, 167 
duplication formula, 174 
and modular equations, 280 
studied by Euler, 175 
lemniscate division, 179 
Fermat, 85, 157 
and algebraic geometry, 13, 87 
and Diophantus, 48 
and Diophantus’s method, 7 
and rational right triangles, 159 
infinite descent, 159 
integration formula, 126 
last theorem, 158 
and elliptic curves, 218 
attempt by Lindemann, 25 
for n = 4, 159 
little theorem, 82, 257 
proof using inverses, 260 
Observations on Diophantus, 
158 
tangent method, 128 
applied to folium, 128 
Ferrari, 73 
solution of quartic, 77 
Fibonacci 
and cubic irrationals, 71 
field, 72, 265, 297 
algebraic number, 303 
and vector space, 301 
as vector space, 303, 311 
axioms, 302 
finite, 280, 303 
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generated by algebraic number, 
305 
of algebraic functions, 318 
of all algebraic numbers, 313 
of congruence classes mod p, 303 
of congruence classes mod p(x), 312 
of finite dimension over Q, 315 
Fior, 73 
focus, 28 
in astronomy, 29 
folium 
asymptote, 90 
double point, 89 
drawn by Huygens, 90 
has genus 0, 163 
of Descartes, 89 
parameterization, 90 
tangent of, 128 
foundations 
arithmetic and set-theoretic, 54 
geometric, 54 
of geometry, 106 
four-square theorem, 37 
Fourier, 329 
Frege, 339 
Friedman, 345 
function 
algebraic, 138, 146 
choice, 331 
computable, 336 
continuous, 191 
differentiable, 205 
Dirichlet, 329, 331 
elementary, 170 
elliptic, 138 
linear fractional, 259 
many-valued, 149, 205, 214 
modular, 78, 178 
rational, 146 
symmetric, 264 
theta, 38, 178 
transcendental, 138 
zeta, 153 
fundamental group, 283, 294 
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as group of motions, 294 and modular functions, 78, 178 
defined by Poincaré, 295 and the agM, 178 
generators and relations, 283, area of hyperbolic circle, 240 
295 construction of 17-gon, 25 
of sphere, 283 curvature, 225, 233 
of torus, 283 formula for sphere motion, 254 
fundamental polygon, 288 fundamental theorem of algebra, 190 
and universal covering, 291 geodesic curvature, 236 
for genus 2, 292 geodesy, 232 
for torus, 291 sphere, 198 
fundamental theorem theorema egregium, 233 
of algebra, 181, 189 triangle tessellation, 255, 292 
and Bézout’s theorem, 193, used algebraic integers, 308 
195 Gaussian 
and intersections, 195 curvature, 225, 233 
d’ Alembert proof, 190 elimination, 63, 65, 298 
Gauss proofs, 190 Gaussian integer, 306 
Kronecker version, 313 Euclidean algorithm, 310 
motivated by integration, 188 unique prime factorization, 310 
real version, 190 generators and relations, 271, 275 
of arithmetic, 41 and topology, 277 
of calculus, 137 read off tessellation, 277 
generalized, 213 genus, 162 
in Leibniz formalism, 137 and Euler characteristic, 289 
and rational functions, 163 
G as number of holes, 204 
Galois of algebraic curve, 204, 283 
and modular equations, 280 topological meaning, 198 
and normal subgroups, 262, 265 geodesic, 235 
and the quintic, 78, 266 curvature, 236 
discovered finite fields, 280, 303 differential equation, 235 
discovered simple groups, 279 mapped to straight line, 245 
introduced group concept, 257, on cone, 236 
265 on cylinder, 236 
theory, 265 on pseudosphere, 236 
and regular polyhedra, 22 on sphere, 235 
theory of fields, 265 geometric series, 51, 134 
gamma function, 155 and area of parabola, 60 
Gauss and volume of tetrahedron, 59 
and circle division, 25 in Euclid, 61, 139 
and complex integration, 212 geometry 
and conformal mapping, 210 algebraic, 14, 29, 63, 86 
and elliptic functions, 177 analytic, 87, 97 


and lemniscate division, 179 complex interpretation, 252 
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descriptive, 104 
differential, 226 
foundations of, 106 
hyperbolic, 241 
infinitesimal, 137 
non-Euclidean, 19, 87, 182, 
225, 234 
of surfaces, 245 
projective, 95, 99, 104 
spherical, 240 
Gédel 
and continuum hypothesis, 328 
arithmetization, 341 
incompleteness theorem, 339 
and computability, 339 
“miracle” of computability, 335 
second theorem, 341 
in Hilbert and Bernays, 342 
golden ratio, 26 
golden rectangle, 21 
constructibility, 71 
Goursat, 214 
Grandi, 91 
Grassmann 
and vector spaces, 299 
and induction, 42 
and inner product, 14 
and Steinitz exchange lemma, 
301 
Green, 213 
Green’s theorem, 213 
implies Cauchy’s theorem, 213 
Gregory, 147 
and interpolation, 148 
and Taylor’s theorem, 147 
Gregory—Newton formula, 147 


group 
abelian, 262 
simple, 279 


alternating, 266 
associativity, 267 
cancellation, 268 
concept of Galois, 265 
cyclic, 259 


and radicals, 265 

simple, 279 
defining properties, 258 
fundamental, 283, 294 
identity, 258 
inverse, 258 
isomorphism, 267, 268 
isomorphism problem, 295 
of motions, 272 
of permutations, 265 
of real projective line, 273 
of rigid motions, 252 
of transformations, 257, 272 
on acubic curve, 267 
polyhedral, 269 

and theory of equations, 271 
presentation, 271 
quotient, 262 
rotation, 269 
Sn, 263 
simple, 279 

smallest nonabelian, 279 
smallest nonabelian, 278 
solvable, 265 
symmetric, 263 
word problem, 338 

group theory, 17, 223, 257 

and theory of equations, 265 
combinatorial, 275, 276 
geometric, 276 


H 
Hadamard, 187 
halting problem, 337 
Hamilton 
presented icosahedral group, 271 
handle, 290 
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